Hypothesis

7 Matching Annotations

Jun 2017
arxiv.org arxiv.org

()

1
1. pranava 21 Jun 2017
  
  in Public
  
  The analysis showsthat, although they are superficially similar, NCE is a general parameter estimation technique that is asymp-totically unbiased, while negative sampling is best understood as a family of binary classification modelsthat are useful for learning word representations but not asa general-purpose estimator
  
  I think NCE is slightly different from CE. Unfortunately, Chris sort of ignores Noah's work on CE in this explanation. Although, the connection between NCE and NS is nicely explained.
  
  NegSampling
Visit annotations in context

Tags

NegSampling

Annotators

pranava

URL

arxiv.org/pdf/1410.8251.pdf
pdfs.semanticscholar.org pdfs.semanticscholar.org

e2c498be92fa7074738b15a48808fa48cece.pdf

1
1. pranava 21 Jun 2017
  
  in Public
  
  We present an extension to Jaynes’ maximum entropy principle that handles latent variables. Theprinciple oflatent maximum entropywe propose is different from both Jaynes’ maximum entropy principleand maximum likelihood estimation, but often yields better estimates in the presence of hidden variablesand limited training data. We first show that solving for a latent maximum entropy model poses a hardnonlinear constrained optimization problem in general. However, we then show that feasible solutions tothis problem can be obtained efficiently for the special case of log-linear models—which forms the basisfor an efficient approximation to the latent maximum entropy principle. We derive an algorithm thatcombines expectation-maximization with iterative scaling to produce feasible log-linear solutions. Thisalgorithm can be interpreted as an alternating minimization algorithm in the information divergence, andreveals an intimate connection between the latent maximum entropy and maximum likelihood principles.To select a final model, we generate a series of feasible candidates, calculate the entropy of each, andchoose the model that attains the highest entropy. Our experimental results show that estimation basedon the latent maximum entropy principle generally gives better results than maximum likelihood whenestimating latent variable models on small observed data samples.
  
  Towards intelligent negative sampling
  
  NegSampling
Visit annotations in context

Tags

NegSampling

Annotators

pranava

URL

pdfs.semanticscholar.org/f2a3/e2c498be92fa7074738b15a48808fa48cece.pdf
pdfs.semanticscholar.org pdfs.semanticscholar.org

smith+eisner.acl05.pdf

5
1. pranava 21 Jun 2017
  
  in Public
  
  Wang et al. (2002) discuss the latent maximumentropy principle. They advocate running EM manytimes and selecting the local maximum that maxi-mizes entropy. One might do the same for the localmaxima of any CE objective, though theoretical andexperimental support for this idea remain for futurework.
  
  Interesting proposal, quite similar to the neg. sampling with 'exploration / exploitation'.
  
  Definitely, worth atleast a couple reads!
  
  NegSampling
2. pranava 21 Jun 2017
  
  in Public
  
  One can envision amixedobjective function that tries to fit the labeledexamples while discriminating unlabeled examplesfrom their neighborhoods.
  
  Interesting - a mixed objective function -> this seems like a multi-task framework!
  
  --> Re-read and understand
  
  NegSampling
3. pranava 21 Jun 2017
  
  in Public
  
  We have presentedcontrastive estimation, a newprobabilistic estimation criterion that forces a modelto explain why the given training data were betterthan bad data implied by the positive examples.
  
  This is again an interesting way to see it: "... forces a model to explain why the given training data were better than bad data implied by the positive examples."
  
  NegSampling
4. pranava 21 Jun 2017
  
  in Public
  
  Viewed as a CE method, this approach (though ef-fective when there are few hypotheses) seems mis-guided; the objective says to move mass to each ex-ample at the expense of all other training examples
  
  A very cool remark and makes sense!!
  
  NegSampling
5. pranava 21 Jun 2017
  
  in Public
  
  An alternative is to restrict theneighborhood to the set of observed training exam-ples rather than all possible examples (Riezler, 1999;Johnson et al., 1999; Riezler et al., 2000):
  
  This equation is reminiscent of the equation proposed by Nickel et al., 2017 - the Poincare Embeddings paper. Especially, look for Negative Sampling.
  
  NegSampling
Visit annotations in context

Tags

NegSampling

Annotators

pranava

URL

pdfs.semanticscholar.org/29c3/4a034f6f35915a141dac98cabf625bea2b3c.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL