5 Matching Annotations
  1. Apr 2026
    1. we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation.

      这一声明揭示了Meta可能采用了全新的预训练方法,结合架构、优化和数据筛选的全面革新,这可能解释了他们如何实现如此显著的效率提升,值得深入探究这些改进的具体技术细节。

    1. To predict the behavior of people in these documents effectively, representing their emotional states is likely helpful, as predicting what a person will say or do next often requires understanding their emotional state.

      情绪表征不是 Anthropic 有意训练的结果,而是预训练阶段的「副产品」:为了预测人类文本中的下一个词,模型被迫学会了理解情绪。令人惊讶的是,这个能力在后训练阶段被「复用」来驱动 AI 助手的行为,形成了一条没有人刻意设计的情绪回路。

  2. Feb 2019
  3. Nov 2018
    1. How Many Samples are Needed to Learn a Convolutional Neural Network?

      Conclusion 里说:"Our paper only considered CNN with linear activation.“ 啥?linear activation?我不知道还有什么理由让我继续读下去[允悲]。。。另有 reddit 上对此文的讨论:http://t.cn/ELKbsjx

    2. Rethinking ImageNet Pre-training

      ImageNet预训练的反思——ImageNet预训练可在训练早起加速收敛,但未必能提供正则化或提高最终目标任务准确性


      何神的新paper!欲最快速通晓文章的 insights,请直接阅读文章的 discussions 部分即可!