14 Matching Annotations
  1. Feb 2023
    1. The second purpose of skip connections is specific to transformers — preserving the original input sequence.
  2. Dec 2022
  3. Mar 2021
    1. Hyperaware of how annoying it is when you want a recipe and have to read a 20-paragraph story about someone's great gran (and feeling bad you don't care), I have provided a skip link if you don't care about my back story to this post.

      I've been seeing many references to this sort of annoying storytelling in recipes lately.

      (previous example: https://hyp.is/g9iWXJDdEeuv5SsIpr4k5Q/www.daringgourmet.com/traditional-welsh-cakes/)

    1. There's been occasional talk in the IndieWeb chat about recipes that have long boring pre-stories and don't get to the point.

      This is one of the first examples I've seen of a food blog that has a "Jump to Recipe" button and a "Print Recipe" button right at the top for those who are in a hurry, or who have read the post previously.

      Will look for other examples...

  4. Dec 2020
  5. Dec 2019
    1. The context words are assumed to be located symmetrically to the target words within a distance equal to the window size in both directions.

      O que significa dizer "simetricamente localizadas" as palavras alvo?

  6. Jun 2019
    1. This concept is pretty powerful, and I’m sure you’ve already read all about it. If you haven’t, browse your favorite mildly-technical new source (hey Medium!) and you’ll be inundated with people telling you how much potential there is. Some buzzwords: asset/rights management, decentralized autonomous organizations (DAOs), identity, social networking, etc.


  7. Dec 2017
  8. Apr 2017
    1. J(t)NEG=logQθ(D=1|the, quick)+log(Qθ(D=0|sheep, quick))

      Expression to learn theta and maximize cost and minimize the loss due to noisy words. Expression means -> probability of predicting quick(source of context) from the(target word) + non probability of sheep(noise) from word

    2. Algorithmically, these models are similar, except that CBOW predicts target words (e.g. 'mat') from source context words ('the cat sits on the'), while the skip-gram does the inverse and predicts source context-words from the target words. This inversion might seem like an arbitrary choice, but statistically it has the effect that CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation)
    1. arg maxvw;vcP(w;c)2Dlog11+evcvw

      maximise the log probability.

    2. p(D= 1jw;c)the probability that(w;c)came from the data, and byp(D= 0jw;c) =1p(D= 1jw;c)the probability that(w;c)didnot.

      probability of word,context present in text or not.

    3. Loosely speaking, we seek parameter values (thatis, vector representations for both words and con-texts) such that the dot productvwvcassociatedwith “good” word-context pairs is maximized.
    4. In the skip-gram model, each wordw2Wisassociated with a vectorvw2Rdand similarlyeach contextc2Cis represented as a vectorvc2Rd, whereWis the words vocabulary,Cis the contexts vocabulary, anddis the embed-ding dimensionality.

      Factors involved in the Skip gram model