19 Matching Annotations
  1. Mar 2019
    1. 0xffff0 is 16 bytes before the end of the BIOS (0x100000). Therefore we shouldn't be surprised that the first thing that the BIOS does is jmp backwards to an earlier location in the BIOS; after all how much could it accomplish in just 16 bytes?

      This explains well why we need a long jump, however why not we just start from the place at the beginning of the BIOS?

  2. Dec 2018
    1. PC starts executing with CS = 0xf000 and IP = 0xfff0.

      What does CS stand for?

    1. because the definition of %x does not dominate all of its uses

      What does "dominate all of its uses" mean?

    1. thehardware raises the privilege level and starts executing a pre-arranged function in thekernel

      So in this case, we need to register these functions to hardware, don't we?

    2. hus a process al-ternates between executing inuser spaceandkernel space.

      When I was programming in MIPS (and run it in a simulator), I sometimes need to use system call to print a number on screen, I used to believe that the system call is part of the MIPS, but here it states clear that system call is the way how user interact with the kernel. But how MIPS provided this unified interface? If I am writing a program in C and compile it into MIPS that requires system call, will it be compiled to these system call instruction? Or calling a function that is actually part of the kernal?

  3. Oct 2018
    1. Because there are only local latent variables, we can easily decompose the ELBO into terms Li\mathcal{L}_iL​i​​\mathcal{L}_i that depend only on a single datapoint xix_ix​i​​x_i. This enables stochastic gradient descent.

      Why decompose of the ELBO enabled stochastic gradient descent?

    2. KL(q​λ​​(z∣x)∣∣p(z∣x))=KL(q_\lambda(z \vert x) \vert \vert p(z \vert x)) = Eq[logqλ(z∣x)]−Eq[logp(x,z)]+logp(x)\mathbf{E}_q[\log q_\lambda(z \vert x)]- \mathbf{E}_q[\log p(x, z)] + \log p(x)E​q​​[logq​λ​​(z∣x)]−E​q​​[logp(x,z)]+logp(x)

      How is this equation derived from the definition of KL divergence, which is [KL(q_\lambda (z\mid x)\mid\midp(z\mid x)=...] is different..

    3. By Jensen’s inequality, the Kullback-Leibler divergence is always greater than or equal to zero. This means that minimizing the Kullback-Leibler divergence is equivalent to maximizing the ELBO.

      Because the evidence \(p(x)\) is a unknown constant

  4. Sep 2018
  5. Aug 2018
    1. interface(接口)关键字将抽象类的概念更延伸了一步,它完全禁止了所有的函数定义


    2. 此外,若通过继承增添了一种新类型,如“三角形”,那么我们为“几何形状”新类型编写的代码会象在旧类型里一样良好地工作。所以说程序具备了“扩展能力”,具有“扩展性”。


    3. 统一记号法


    4. 有两种做法可将新得的派生类与原来的基类区分开。
      1. 添加方法
      2. 改善基类
    5. 沿这种思路产生的设计将是非常笨拙的,会大大增加程序的复杂程度。相反,新建类的时候,首先应考虑“组织”对象;这样做显得更加简单和灵活。利用对象的组织,我们的设计可保持清爽。一旦需要用到继承,就会明显意识到这一点。


    6. 第二个原因是允许库设计人员修改内部结构,不用担心它会对客户程序员造成什么影响。


    7. 第一个原因是防止程序员接触他们不该接触的东西


  6. Mar 2018
    1. in NLP we typically use filters that slide over full rows of the matrix (words).

      Sounds like element wise dot product over rows.

    2. It makes intuitive sense that you build edges from pixels, shapes from edges, and more complex objects from shapes.

      More fine-grained (low level) feature at first several layers, and coarse at the final layers. Receptive fields are getting larger during layers of convolution.