8 Matching Annotations
  1. Jun 2019
    1. This concept is pretty powerful, and I’m sure you’ve already read all about it. If you haven’t, browse your favorite mildly-technical new source (hey Medium!) and you’ll be inundated with people telling you how much potential there is. Some buzzwords: asset/rights management, decentralized autonomous organizations (DAOs), identity, social networking, etc.

      Skip

  2. Dec 2017
  3. Apr 2017
    1. J(t)NEG=logQθ(D=1|the, quick)+log(Qθ(D=0|sheep, quick))

      Expression to learn theta and maximize cost and minimize the loss due to noisy words. Expression means -> probability of predicting quick(source of context) from the(target word) + non probability of sheep(noise) from word

    2. Algorithmically, these models are similar, except that CBOW predicts target words (e.g. 'mat') from source context words ('the cat sits on the'), while the skip-gram does the inverse and predicts source context-words from the target words. This inversion might seem like an arbitrary choice, but statistically it has the effect that CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation)
    1. arg maxvw;vcP(w;c)2Dlog11+evcvw

      maximise the log probability.

    2. p(D= 1jw;c)the probability that(w;c)came from the data, and byp(D= 0jw;c) =1p(D= 1jw;c)the probability that(w;c)didnot.

      probability of word,context present in text or not.

    3. Loosely speaking, we seek parameter values (thatis, vector representations for both words and con-texts) such that the dot productvwvcassociatedwith “good” word-context pairs is maximized.
    4. In the skip-gram model, each wordw2Wisassociated with a vectorvw2Rdand similarlyeach contextc2Cis represented as a vectorvc2Rd, whereWis the words vocabulary,Cis the contexts vocabulary, anddis the embed-ding dimensionality.

      Factors involved in the Skip gram model