16 Matching Annotations
  1. May 2026
    1. Our method, Natural Language Autoencoders (NLAs), converts an activation into natural-language text we can read directly. For example: When asked to complete a couplet, NLAs show Claude planning possible rhymes in advance.

      NLA技术将AI模型的内部激活状态直接转换为可读的自然语言文本,实现了对AI思维过程的直接解读,这是AI可解释性领域的重大突破。

  2. Apr 2026
  3. Dec 2025
  4. Nov 2024
    1. I've been down there enough times to see the same patterns repeat, and sometimes I can even interrupt them. That's why having goofy names for them matters so much, because it reminds me not to believe the biggest bog lie of all: that I'm stuck in a situation unlike any I, or anyone else, has ever seen before

      Giving repeating neg patterns wrt procrastination / not getting into action, a silly name helps in defeating the pattern (rather than beating yourself up over it I suppose).

  5. Nov 2022
    1. The novelist and screenwriter Raymond Chandler said he avoided reading books written by someone who didn’t “take the pains” to write out the words. (It used to be common for writers to dictate into a recorder then have an assistant transcribe those words.) “You have to have that mechanical resistance,” Chandler wrote in a 1949 letter to actor/writer Alex Barris. “When you have to use your energy to put those words down, you are more apt to make them count.”
  6. Sep 2022
  7. Apr 2022
  8. Oct 2021
  9. Jul 2021
  10. Mar 2021
  11. Jan 2021
  12. Sep 2020
  13. Dec 2018
  14. Sep 2015
  15. Jun 2015