5 Matching Annotations
  1. Last 7 days
    1. this behavioral shift does not compromise performance on straightforward problems, it might affect performance on more challenging tasks.

      「简单题不影响,难题可能变差」——这个不对称性极为危险。它意味着我们在用简单任务验证 Agent 可靠性时,得到的是虚假的信心。而当 Agent 真正面临高风险、高复杂度的任务时,上下文累积已经悄悄关闭了它的自我验证模式,在没有任何预警的情况下退化为浅层推理。这是一种「隐性能力衰减」,比显而易见的失败更危险。

  2. Apr 2023
  3. Aug 2020
  4. Nov 2019
    1. Because they're more integrated and try to serialize an incomplete system (e.g. one with some kind of side effects: from browser/library/runtime versions to environment to database/API changes), they will tend to have high false-negatives (failing test for which the production code is actually fine and the test just needs to be changed). False negatives quickly erode the team's trust in a test to actually find bugs and instead come to be seen as a chore on a checklist they need to satisfy before they can move on to the next thing.
    1. So finally I'm coming out with it and explaining why I never use shallow rendering and why I think nobody else should either. Here's my main assertion:With shallow rendering, I can refactor my component's implementation and my tests break. With shallow rendering, I can break my application and my tests say everything's still working.This is highly concerning to me because not only does it make testing frustrating, but it also lulls you into a false sense of security. The reason I write tests is to be confident that my application works and there are far better ways to do that than shallow rendering.