32 Matching Annotations
  1. Sep 2022
    1. To tackle the search problem, two main challenges need to be ad-dressed. The first challenge is the huge search space with numerousplausible combinations. A model stacks many layers, each of whichcontains different numbers of parameters. Small changes to anyelement of the architecture may result in a new neural network thatcould produce largely different performance even when trainedon the same dataset. Model developers usually put laborious engi-neering effort into finding an appropriate architecture for the tinymodel, which is time-consuming and computing resource-hungry.The second challenge is that the objective of this search problem,i.e., the performance of the tiny model after distillation, is veryexpensive to compute. It is impractical and infeasible to train andevaluate each model we find in the searching process. Therefore,an easy-to-compute and effective predictive metric is desired to betailored for this difficult search problem

      estimator可以参考的表述

    1. Networks with dynamic architectures not only saveredundant computation for canonical (”easy”) samples, butalso preserve their representation power when recognizingnon-canonical (”hard”) sampl

      Motivation of early stop

  2. Jul 2022
    1. For example, one consequential wordis often the difference between Codex producing correct or incorrect results. Other factors such as:• the context of existing code by a user,• defined function and variable names,• existing comments and documentation by a user,• training data distribution, and• conciseness and length of prompt,

      codex 表现制约因素

    1. It stands to reason that thelower layers have the most information about linear word order, and the higher layers may have moreinformation about semantic knowledge and task-specific knowledge.

      transformer不同层之间的功效

    1. Further, in our interviews, multiple people described their usual commenting workflow as beingpost-hoc: they add comments after completing code. So, comments put in before completing thecode is out-of-place for these participants

      copilot将先code后comment的模式变为先comment后code

    2. Comment cleaning. Cleaning up their comments after completing a Copilot interaction was acommon occurrence. Many participants, P3, P4, P7, and, P8 would repeatedly delete comments thatwere meant for Copilot. P19 said that cleaning up comments written for Copilot is essential:

      这是个很有趣的点,但就是太小了,不太值得发论文

    1. Security was rarely a concern among the com-ments in the issue reports of bugs whose fixing introducedvulnerability regressions.

      fix的安全问题不被concern

    2. For each interview participant, we donated 30 USD tothe Mozilla Foundation or a charity chosen by the interviewee as atoken of appreciation for their time and effort.

      招募participant

    1. Cyclomatic complexity [4] is the most widely used com-plexity metric. McCabe computed the complexity usingv(G) = e − n + 2 where e and n refer to number of edges andnodes in a control flow graph

      Cyclomatic complexity

  3. Jun 2022
    1. SRS docu-ments describe the functionality and expected performance for software products, naturally affecting all the subsequent phasesin the process. The requirement set defined in SRS documents are analyzed and refined in the design phase, which results invarious design documents. Then, the developers proceed with these documents to build the code for the software system3.

      需求文档的表述

    1. Applying Transformation. Assume a set of transformationrules Φ = {𝜙1, 𝜙2, 𝜙3, ...}. Given original code 𝑐𝑖 , 𝜙 𝑗 (𝑐𝑖 ) transformsthe code, changing the structure while preserving semantics. Fig-ure 3 shows how to apply such transformation to 𝑐𝑖 . It works inthree steps:• Find Transformation Location. Given a piece of source code (𝑐𝑖 ),we first use tree-sitter3 to parse out the AST (𝑇𝑐𝑖 ). From theAST, we extract potential locations for de-naturalization. Theselocations are nodes (𝑛𝑘 ) in 𝑇𝑐𝑖 . While choosing location 𝑛𝑘 from𝑇𝑐𝑖 , we consult Φ – we extract the nodes where at least one of𝜙 𝑗 ∈ Φ is applicable.• Select Transformation Rule. Once we have a set of such nodes,we filter out the transformation rules that cannot be appliedto any node of in 𝑇𝑐𝑖 . After such a filtration, we have a set oftransformations Φ𝑎 ⊆ Φ. At this stage, we randomly select onetransformation pattern 𝜙 𝑗 ∈ Φ𝑎 to apply at an application loca-tion (AST node) 𝑛𝑘 .• Apply Transformation. We apply 𝜙 𝑗 to 𝑛𝑘 to get the transformednode 𝑛′𝑘 . We then structurally match 𝑛′𝑘 with the original AST𝑇𝑐𝑖 , specifically 𝑛𝑘 . We adapt the context of 𝑛𝑘 to the transformednode’s (𝑛′𝑘 ) context. In that way, we get the transformed AST(𝑇 ′𝑐𝑖 ), which we then translate to get the transformed code 𝑐 ′𝑖 .We designed the transformation function 𝜙 𝑗 and subsequentcontext adaptation in such a way that preserves the meaning orfunctionality of the original code. We use AST analysis and (ap-proximated) data flow analysis on code AST

      SPT的应用表述

    1. the high numberof compute operations required can result in unrealistically longtraining times (e.g., training GPT-3 with 175 billion parameters [11 ]would require approximately 288 years with a single V100 NVIDIAGPU).

      GPT3训练成本

    1. As a concrete measure, we suggest reporting the total number of floating point operations (FPO) required togenerate a result.13 FPO provides an estimate to the amount of work performed by a computational process. It iscomputed analytically by defining a cost to two base operations, ADD and MUL. Based on these operations, the FPOcost of any machine learning abstract operation (e.g., a tanh operation, a matrix multiplication, a convolution operation,or the BERT model) can be computed as a recursive function of these two operations. FPO has been used in the pastto quantify the energy footprint of a model [26, 42, 12, 41], but is not widely adopted in AI

      FLOPs的介绍

    1. For example, NVIDIA estimated that 80–90% of the ML workload is inference processing [Leo19]. Similarly,Amazon Web services claimed that 90% of the ML demand in the cloud is for inference [Bar19].

      inference整体能耗论据

    1. A clear recent trend in the AI community is that models are getting significantly larger. It only took 3 months to shift the title of the largest model from BERT-Large to GPT-2 (Radford et al. 2019) in 2020 while the number of parameters of GPT-2 is around 5 times larger than that of BERT-Large. Moreover, GPT-2 further evolves into GPT-3 (Brown et al. 2020) with 175 Billion parameters. More recently, GLM (Du et al. 2021) has clinched the title with surprisingly 1.75 Trillion parameters. These large models consume more data and have better performance than their smaller counterparts

      AI模型不断变大的发展趋势

  4. May 2022
    1. Forinstance, source code is not as homogeneous as NL: it is com-posed of both the code in a function body, which is written inprogramming language (PL), as well as optional commentswritten in NL

      代码的异质性

    1. Python programs (typically a single function), and evaluates overall functional accuracy(pass rate) across examples using several test cases for each program

      可以用test case来测试多行代码生成的准确率

    1. Table 1: Individual and average task completion times. Cells with an orange cell background indicate that the participant neversucceeded because they were stopped after approximately 20 minutes of trying. DNF implies the participant did not finish ontime.

      低质量的suggestion反而会降低开发效率

    1. Both the ethical and security problems of DL code models mani-fest an emerging appeal from the open-source community: To es-tablish an effective protection mechanism against the unau-thorized usage of their open-source code in deep learningtasks

      Motivations of this paper