2 Matching Annotations
- Apr 2017
-
www.tensorflow.org www.tensorflow.org
-
J(t)NEG=logQθ(D=1|the, quick)+log(Qθ(D=0|sheep, quick))
Expression to learn theta and maximize cost and minimize the loss due to noisy words. Expression means -> probability of predicting quick(source of context) from the(target word) + non probability of sheep(noise) from word
-
Algorithmically, these models are similar, except that CBOW predicts target words (e.g. 'mat') from source context words ('the cat sits on the'), while the skip-gram does the inverse and predicts source context-words from the target words. This inversion might seem like an arbitrary choice, but statistically it has the effect that CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation)
-