17 Matching Annotations
  1. Last 7 days
    1. In our survey, respondents most commonly reported using the time they save with AI coding tools to design systems, collaborate, and learn.

      Respondents say their companies are using AI to generate test cases

    2. nearly all of the survey participants reported using AI coding tools both outside of work or at work at some point

      Almost every respondent has used AI coding tools at work

    3. Easier to work with new programming languages, and understand existing codebases.

      AI coding tools make it easy to adopt new programming languages and understand existing code bases.

  2. Mar 2025
    1. AutoDidact: Bootstrapping Search Through Self-Verification

      https://x.com/karminski3/status/1899958835084656663

      『强化学习微调要更猛了!刚有个老哥给Unsloth框架的GRPO部分增加了个功能,可以支持函数调用和代理反馈循环。

      简单来讲,这个功能实现了: - 被训练模型会根据文档生成自己的问题 - 然后使用搜索工具在语料库中寻找答案 - 最后再用其他大的模型作为评判标准来评估自己的回答是否正确 - 最后通过强化学习(RL)来提高能力

      这个方法最大的创新是,增加接口后,就实现了用大模型监督小模型学习,进而不需要人来监督了(实际上为了更好的结果还是要人参与一下)。节省了大量时间。

      根据作者的说法,用4090训练了一小时,被训练模型问题回答准确率就从23%上升到了53%。』

    1. Instead of continuously increasing pre-training budgets, test-time compute allows modes to “think longer” during inference

      这里直接将test-time compute跟 "think longer" 画了等号,并不对。下面 Categories of Test-time Compute https://is.gd/8NVrHh 一节也列出了一些具体的策略:通过Verifier或者调整Proposal Distribution

    1. Proposal Distribution Enhancement

      在自回归模型中,每一步生成token时,模型会输出一个概率分布(proposal distribution),然后从中采样。Enhancement可能是指调整这个分布,比如通过重排序、过滤、引入外部知识,或者结合搜索策略(如集束搜索、树搜索)来改进生成结果。

  3. Feb 2025
    1. First, SFT’s methodology of minimizing the discrepancybetween predicted outputs and gold-standard references in-herently caps model performance at the quality level of thetraining data.

      SFT有个固有的问题:模型的性能上限被训练数据的质量所限制住了,也就是说用SFT这个方法来做对齐微调很可能会损害模型原有的能力——按论文的说法,CPO则不会

  4. Dec 2024
    1. Thismethod systematically explores the repository knowledge graphand prioritizes the discovery of critical information such as reposi-tory functions and dependency structures that have a greater impacton resolving issues. By simulating multiple trajectories and evaluat-ing their importance, MCTS dynamically narrows the search spaceand focuses computational resources on the most relevant areas.

      该方法系统地探索存储库知识图谱,找出对解决问题影响较大的关键信息,比如代码库功能点和依赖结构,并对结果进行重要性排序。 通过模拟多个轨迹并评估其重要性,MCTS 动态缩小搜索空间并将计算资源集中在最相关的区域。

    2. Excessive reference relationships may increase the complexity ofthe graph structure and affect the analysis efficiency and accu-racy of the model.

      过多的引用关系可能会增加图结构的复杂性,影响模型的分析效率和准确性

    1. The main role of SBFL-identified methods is to reveal more “hints”on relevant classes and methods beyond those mentioned in theproblem statement. The LLM agent can then use the context re-trieval APIs to examine these methods. Since the SBFL-identifiedmethods are presented to the agent together with the problemstatements, the agent can then cross-reference between these twosources of information.

      SBFL 识别方法的主要作用是揭示问题陈述中提到的相关类和方法之外的更多“提示”。 然后,大语言模型代理可以使用上下文检索 API 来检查这些方法。 由于 SBFL 识别的方法与问题陈述一起呈现给代理,因此代理可以在这两个信息源之间交叉引用。

  5. Nov 2024
    1. To this end, we develop a novel ASE method namedRepoUnderstander by guiding agents to comprehensively under-stand the whole repositories. Specifically, we first condense thecritical information of the whole repository into the repositoryknowledge graph in a top-to-down mode to decrease the complex-ity of repository. Subsequently, we empower the agents the abilityof understanding whole repository by proposing a Monte Carlotree search based repository exploration strategy. In addition, tobetter utilize the repository-level knowledge, we guide the agents tosummarize, analyze, and plan. Then, they can manipulate the toolsto dynamically acquire information and generate the patches tosolve the real-world GitHub issues.

      论文提出了一种名为RepoUnderstander的新颖方法,该方法指导代理通过以下几个步骤来全面理解整个代码仓库:

      • 构建代码仓库知识图谱:通过自上而下的方式将整个仓库的关键信息压缩成知识图谱,以降低仓库的复杂性。
      • 基于蒙特卡洛树搜索的仓库探索策略:赋予代理理解整个仓库的能力,通过模拟多种路径并评估它们的奖励分数,逐步缩小搜索空间,引导代理关注最相关的区域。
      • 信息利用与补丁生成:指导代理总结、分析和规划,然后操作工具动态获取信息并生成解决现实世界GitHub问题的补丁。
  6. Oct 2024
    1. To identify the essential code elements needed to com-plete the given infilling method m in a repository, a naivesolution might scan the entire codebase for all accessibleelements, which would introduce excessive noise. Anotherapproach could focus on methods with similar signatures orcontexts; however, these often provide irrelevant elementsthat do not serve m’s functional purpose, leading to redun-dancy and missing critical elements.

      problematic methods

    1. Greedy Selection. Retrieval is performed if<cc> is the most likely token following <eof>.• Threshold Selection. If the probability of <cc>

      greedy: 只要<cc>的概率最大即可,不管这个概率是多少。

      threshold: <cc> 的概率要达到一定的门槛