Hypothesis

28 Matching Annotations

Feb 2025
arxiv.org arxiv.org

2501.12948v1.pdf

26
1. james.zzzzzz 04 Feb 2025
  
  in Public
  
  plan
  
  开源
  
  强大
  
  有趣
  
  性能媲美 OpenAI-O1
  
  成本降低十分之一较 OpenAI
  
  先驱- 使用强化学习,低成本,开源,不再是堆积gpu训练,通过算法,硬件优化等方向发展看待问题.
2. james.zzzzzz 04 Feb 2025
  
  in Public
  
  In the future,
  
  未来方向
  
  通用能力 (角色扮演,函数调用)
  
  语言混合 (仅针对中英优化,提供更多语言)
  
  提示工程 (性能优化)
3. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Unsuccessful Attempts
  
  分享未成功的尝试
4. james.zzzzzz 04 Feb 2025
  
  in Public
  
  DeepSeek-R1 Evaluation
  
  DeepSeek-R1 评估
5. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Wait, wait. Wait. That’s an aha moment I can flag here
  
  aha moment
  
  该模型学会使用拟人化的语气重新思考
6. james.zzzzzz 04 Feb 2025
  
  in Public
  
  ratherthan explicitly teaching the model on how to solve a problem, we simply provide it with theright incentives,
  
  非明确教模型如何解决问题, 通过提供正确的激励 .引导自主解决策略
7. james.zzzzzz 04 Feb 2025
  
  in Public
  
  As depicted in Figure 3,
  
  内部发展非外部调整
  
  DeepSeek-R1-Zero 的思考时间在整个训练过程中显示出持续的改进。这种改进不是外部调整的结果，而是模型内部的内在发展
8. james.zzzzzz 04 Feb 2025
  
  in Public
  
  reasoning-relatedbenchmarks
  
  推理相关基准测试
9. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Aha Moment
  
  顿悟时刻
  
  嗯，这个问题看起来是关于中文里的一个常见表达，需要理解在特定情境下“哪里，哪里”这个回答的含义。让我仔细想一想。首先，用户提到两个男人在正常交谈，其中一个人夸赞对方办事能力强，对方回答“哪里，哪里”。要理解这个回答的意思，我需要回忆中文中的礼貌用语和谦虚的表达方式。
10. james.zzzzzz 04 Feb 2025
  
  in Public
  
  As depicted in Table 1
  
  模板培训
  
  此模板要求 DeepSeek-R1-Zero 首先生成一个推理过程，然后是最终答案。我们有意识地将约束限制在这种结构格式上，避免任何特定于内容的偏见
11. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Format rewards
  
  格式奖励
12. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Accuracy rewards
  
  准确率奖励
  
  准确率奖励：准确率奖励模型评估响应是否正确。例如，对于具有确定性结果的数学问题，模型需要以指定格式（例如，在框内）提供最终答案
13. james.zzzzzz 04 Feb 2025
  
  in Public
  
  we adopt a rule-based reward system that mainly consists of twotypes of rewards
  
  两种奖励类型组成,
  
  奖励是训练信号的来源,决定RL的优化方向
14. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Template for DeepSeek-R1-Zero
  
  模板
  
  <think 推理
  
  <answer 答案
15. james.zzzzzz 04 Feb 2025
  
  in Public
  
  weexplore the potential of LLMs to develop reasoning capabilities without any supervised data
  
  走自己的路
  
  在没有任何监督数据的情况下发展推理能力的潜力LLMs
  
  通过强化学习进行自我进化
16. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Summary of Evaluation Results
  
  推理
  
  知识问答
  
  写作,编辑,总结
  
  出色的性能变现
17. james.zzzzzz 04 Feb 2025
  
  in Public
  
  maller Models Can Be Powerful Too
  
  小的模型也可以很强大
  
  -> 🙅大力出奇迹.
18. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Reinforcement Learning
  
  贡献跟: 强化学习解决复杂问题的思维链CoT
19. james.zzzzzz 04 Feb 2025
  
  in Public
  
  we introduceDeepSeek-R1
  
  所以引入R1 结合了少量冷启动数据和多阶段训练管道
20. james.zzzzzz 04 Feb 2025
  
  in Public
  
  DeepSeek-R1-Zero encounters challenges such as poor readability, and languagemixing
  
  DeepSeek-R1-Zero 遇到了可读性差和语言混合等挑战
21. james.zzzzzz 04 Feb 2025
  
  in Public
  
  matching the performanceof OpenAI-o1-0912
  
  与 OpenAI-o1-0912 的性能相当
22. james.zzzzzz 04 Feb 2025
  
  in Public
  
  self-evolution
  
  自我进化通过纯RL过程
23. james.zzzzzz 04 Feb 2025
  
  in Public
  
  we take the first step toward improving language model reasoning capabilitiesusing pure reinforcement learning (RL).
  
  第一个使用纯强化学习RL 提高语言模型推理能力的先驱并证明有效
24. james.zzzzzz 04 Feb 2025
  
  in Public
  
  a model trained via large-scale reinforcement learning (RL) without super-vised fine-tuning (SFT) as a preliminary step
  
  DeepSeek-R1-Zero 是一种通过大规模强化学习（RL）训练的模型，没有监督微调（SFT）作为初步步骤，展示了卓越的推理能力
25. james.zzzzzz 04 Feb 2025
  
  in Public
  
  To support theresearch community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama
  
  开源
26. james.zzzzzz 04 Feb 2025
  
  in Public
  
  Figure 1 | Benchmark performance of DeepSeek-R1.
  
  基准测试
Visit annotations in context

Annotators

james.zzzzzz

URL

arxiv.org/pdf/2501.12948
Jan 2025
docs.worldpay.com docs.worldpay.com

Express Interface Specification

1
1. james.zzzzzz 16 Jan 2025
  
  in Public
  
  CreditCardAdjustment
  
  The Adjustment transaction is used to associate Level III Line Item Detail to a prior successful credit card transaction
Visit annotations in context

Annotators

james.zzzzzz

URL

docs.worldpay.com/assets/pdf/ExpressInterfaceSpecification3.pdf
Dec 2024
harris.uchicago.edu harris.uchicago.edu

PPHA%2031202_Advanced%20Statistics%20for%20Data%20Analysis%20I_Black_2024_Prelim.pdf

1
1. james.zzzzzz 02 Dec 2024
  
  in Public
  
  bcurran@uchicago.edu
  
  test
Visit annotations in context

Annotators

james.zzzzzz

URL

harris.uchicago.edu/sites/default/files/2024-09/PPHA 31202_Advanced Statistics for Data Analysis I_Black_2024_Prelim.pdf

开源

强大

有趣

性能媲美 OpenAI-O1

成本降低十分之一 较 OpenAI

未来方向

分享未成功的尝试

DeepSeek-R1 评估

aha moment

非明确教模型如何解决问题, 通过提供正确的激励 .引导自主解决策略

内部发展 非外部调整

推理相关基准测试

顿悟时刻

模板培训

格式奖励

准确率奖励

两种奖励类型组成,

模板

走自己的路

Annotators

URL

Annotators

URL

Annotators

URL

成本降低十分之一较 OpenAI

内部发展非外部调整