In our study, developers accepted around 30% of GitHub Copilot’s suggestions
GitHub Copilot's impact on production code for Accenture developers
In our study, developers accepted around 30% of GitHub Copilot’s suggestions
GitHub Copilot's impact on production code for Accenture developers
In our survey, respondents most commonly reported using the time they save with AI coding tools to design systems, collaborate, and learn.
Respondents say their companies are using AI to generate test cases
nearly all of the survey participants reported using AI coding tools both outside of work or at work at some point
Almost every respondent has used AI coding tools at work
Easier to work with new programming languages, and understand existing codebases.
AI coding tools make it easy to adopt new programming languages and understand existing code bases.
完整版本可以从沙利文官网下载 https://www.frostchina.com/content/insight/detail/66f2843fbd3cdfe88cf6fa2b
pg.5 AI 代码生成功能分类(AI 编程的优势与局限性)
AutoDidact: Bootstrapping Search Through Self-Verification
https://x.com/karminski3/status/1899958835084656663
『强化学习微调要更猛了!刚有个老哥给Unsloth框架的GRPO部分增加了个功能,可以支持函数调用和代理反馈循环。
简单来讲,这个功能实现了: - 被训练模型会根据文档生成自己的问题 - 然后使用搜索工具在语料库中寻找答案 - 最后再用其他大的模型作为评判标准来评估自己的回答是否正确 - 最后通过强化学习(RL)来提高能力
这个方法最大的创新是,增加接口后,就实现了用大模型监督小模型学习,进而不需要人来监督了(实际上为了更好的结果还是要人参与一下)。节省了大量时间。
根据作者的说法,用4090训练了一小时,被训练模型问题回答准确率就从23%上升到了53%。』
Instead of continuously increasing pre-training budgets, test-time compute allows modes to “think longer” during inference
这里直接将test-time compute跟 "think longer" 画了等号,并不对。下面 Categories of Test-time Compute https://is.gd/8NVrHh 一节也列出了一些具体的策略:通过Verifier或者调整Proposal Distribution
Proposal Distribution Enhancement
在自回归模型中,每一步生成token时,模型会输出一个概率分布(proposal distribution),然后从中采样。Enhancement可能是指调整这个分布,比如通过重排序、过滤、引入外部知识,或者结合搜索策略(如集束搜索、树搜索)来改进生成结果。
First, SFT’s methodology of minimizing the discrepancybetween predicted outputs and gold-standard references in-herently caps model performance at the quality level of thetraining data.
SFT有个固有的问题:模型的性能上限被训练数据的质量所限制住了,也就是说用SFT这个方法来做对齐微调很可能会损害模型原有的能力——按论文的说法,CPO则不会
Thismethod systematically explores the repository knowledge graphand prioritizes the discovery of critical information such as reposi-tory functions and dependency structures that have a greater impacton resolving issues. By simulating multiple trajectories and evaluat-ing their importance, MCTS dynamically narrows the search spaceand focuses computational resources on the most relevant areas.
该方法系统地探索存储库知识图谱,找出对解决问题影响较大的关键信息,比如代码库功能点和依赖结构,并对结果进行重要性排序。 通过模拟多个轨迹并评估其重要性,MCTS 动态缩小搜索空间并将计算资源集中在最相关的区域。
Excessive reference relationships may increase the complexity ofthe graph structure and affect the analysis efficiency and accu-racy of the model.
过多的引用关系可能会增加图结构的复杂性,影响模型的分析效率和准确性
The main role of SBFL-identified methods is to reveal more “hints”on relevant classes and methods beyond those mentioned in theproblem statement. The LLM agent can then use the context re-trieval APIs to examine these methods. Since the SBFL-identifiedmethods are presented to the agent together with the problemstatements, the agent can then cross-reference between these twosources of information.
SBFL 识别方法的主要作用是揭示问题陈述中提到的相关类和方法之外的更多“提示”。 然后,大语言模型代理可以使用上下文检索 API 来检查这些方法。 由于 SBFL 识别的方法与问题陈述一起呈现给代理,因此代理可以在这两个信息源之间交叉引用。
To this end, we develop a novel ASE method namedRepoUnderstander by guiding agents to comprehensively under-stand the whole repositories. Specifically, we first condense thecritical information of the whole repository into the repositoryknowledge graph in a top-to-down mode to decrease the complex-ity of repository. Subsequently, we empower the agents the abilityof understanding whole repository by proposing a Monte Carlotree search based repository exploration strategy. In addition, tobetter utilize the repository-level knowledge, we guide the agents tosummarize, analyze, and plan. Then, they can manipulate the toolsto dynamically acquire information and generate the patches tosolve the real-world GitHub issues.
论文提出了一种名为RepoUnderstander的新颖方法,该方法指导代理通过以下几个步骤来全面理解整个代码仓库:
To identify the essential code elements needed to com-plete the given infilling method m in a repository, a naivesolution might scan the entire codebase for all accessibleelements, which would introduce excessive noise. Anotherapproach could focus on methods with similar signatures orcontexts; however, these often provide irrelevant elementsthat do not serve m’s functional purpose, leading to redun-dancy and missing critical elements.
problematic methods
pruning the specific implementations of func-tions in all dependent files does not signifi-cantly reduce the accuracy of completions
这不是很显然的吗?
Greedy Selection. Retrieval is performed if<cc> is the most likely token following <eof>.• Threshold Selection. If the probability of <cc>
greedy: 只要<cc>的概率最大即可,不管这个概率是多少。
threshold: <cc> 的概率要达到一定的门槛