14 Matching Annotations
  1. May 2025
    1. Still, if one wanted to show policy-compliant CoTs directly to users while avoiding putting strong supervision on them, one could use a separate model, such as a CoT summarizer or sanitizer, to accomplish that.

      But this might not work for very long. Unrestricted CoT could realize by reading prompts and answers on the internet or learning material, and comparing with what it produces, that it is being sanitized, and will learn to lie sooner, then better, to still meet other misaligned goals. Exactly like humans in psychologically unsafe environments.

    1. These researchers go to bed every night and wake up to another week worth of progress made mostly by the AIs. They work increasingly long hours and take shifts around the clock just to keep up with progress—the AIs never sleep or rest. They are burning themselves out

      so they will make big mistakes?

    2. Unfortunately, by this point the AIs are smart enough to guess that honeypots might be in use, even though (and perhaps because) specific mentions of the idea were scrubbed from the training data.

      will this page (and discussions on honeypots) end up in training data? (obviously)

    3. So if Agent-3 is, for example, obviously writing backdoors into code that would allow it to escape, the weaker models would notice.

      the key word is 'obviously' here.

  2. Mar 2019
    1. Components can become selfish and hog the resources

      Sub-systems are always selfish; they always tend to optimize themselves to the detriment of the larger system or other sub-systems.

  3. Feb 2019
  4. Nov 2018
  5. Oct 2018
    1. Wars, revolutions, and social movements, for example, are all archetypes that can fundamentally reconfigure the causal architecture of large and complex systems and put them on a new trajectory. But it is unlikely that one could master the complex and unpredictable causality inherent in these archetypes (although some have tried

      :-)

    2. n fact, creating a temporary change by providing food, schooling, loans, and medicines or changing the behavior of some actors is often relatively easy

      doesn't qualify as "system change"

    3. Do things right before doing the right thing | Russell Ackoff, a prominent systems thinker, strongly believed that it was better to do the right thing wrong than the wrong thing right because the former may be improved by learning, but the latter reinforces ineffective behavior. Our data, however, suggest that engaging with a system may be facilitated by doing the “wrong” thing first. In other words, by engaging in activities even if they are not in line with one’s mission and learning to do them right—that is, getting good at doing them

      Why it may have made sense to start with team-level agile (Scrum), but it now be time to start focusing on doing the right things?

    4. it motivated villagers to engage in a joint effort with Gram Vikas to build water and sanitation infrastructure. The prospect of having a toilet, a shower, and a water tap in the kitchen for every household reduced the villagers’ attention and resistance to the reorganization of the village social life that slowly took place in the background

      quick wins, build trust

  6. Aug 2018