12 Matching Annotations
  1. Jun 2017
    1. Who is Mistaken?Benjamin EysenbachMITbce@mit.eduCarl VondrickMITvondrick@mit.eduAntonio TorralbaMITtorralba@csail.mit.eduFigure 1: Can you determine who has a false belief about this scene? In this paper, we study how to recognize when a person in a short sequence is mistaken. Above, the woman is mistaken about the chair being pulled away from her.TimeFigure 1:Can you determine who believes something incorrectly in this scene?In this paper, we study how to recognizewhen a person in a scene is mistaken. Above, the woman is mistaken about the chair being pulled away from her in the thirdframe, causing her to fall down. Thered arrowindicates false belief. We introduce a new dataset of abstract scenes to studywhen people have false beliefs. We propose approaches to learn to recognizewhois mistaken andwhenthey are mistaken.AbstractRecognizing when people have false beliefs is crucial forunderstanding their actions. We introduce the novel prob-lem of identifying when people in abstract scenes have in-correct beliefs. We present a dataset of scenes, each visuallydepicting an 8-frame story in which a character has a mis-taken belief. We then create a representation of characters’beliefs for two tasks in human action understanding: pre-dicting who is mistaken, and when they are mistaken. Ex-periments suggest that our method for identifying mistakencharacters performs better on these tasks than simple base-lines. Diagnostics on our model suggest it learns importantcues for recognizing mistaken beliefs, such as gaze. We be-lieve models of people’s beliefs will have many


  2. arxiv.org arxiv.org
    1. The analysis showsthat, although they are superficially similar, NCE is a general parameter estimation technique that is asymp-totically unbiased, while negative sampling is best understood as a family of binary classification modelsthat are useful for learning word representations but not asa general-purpose estimator

      I think NCE is slightly different from CE. Unfortunately, Chris sort of ignores Noah's work on CE in this explanation. Although, the connection between NCE and NS is nicely explained.

    1. We present an extension to Jaynes’ maximum entropy principle that handles latent variables. Theprinciple oflatent maximum entropywe propose is different from both Jaynes’ maximum entropy principleand maximum likelihood estimation, but often yields better estimates in the presence of hidden variablesand limited training data. We first show that solving for a latent maximum entropy model poses a hardnonlinear constrained optimization problem in general. However, we then show that feasible solutions tothis problem can be obtained efficiently for the special case of log-linear models—which forms the basisfor an efficient approximation to the latent maximum entropy principle. We derive an algorithm thatcombines expectation-maximization with iterative scaling to produce feasible log-linear solutions. Thisalgorithm can be interpreted as an alternating minimization algorithm in the information divergence, andreveals an intimate connection between the latent maximum entropy and maximum likelihood principles.To select a final model, we generate a series of feasible candidates, calculate the entropy of each, andchoose the model that attains the highest entropy. Our experimental results show that estimation basedon the latent maximum entropy principle generally gives better results than maximum likelihood whenestimating latent variable models on small observed data samples.

      Towards intelligent negative sampling

    1. Wang et al. (2002) discuss the latent maximumentropy principle. They advocate running EM manytimes and selecting the local maximum that maxi-mizes entropy. One might do the same for the localmaxima of any CE objective, though theoretical andexperimental support for this idea remain for futurework.

      Interesting proposal, quite similar to the neg. sampling with 'exploration / exploitation'.

      Definitely, worth atleast a couple reads!

    2. One can envision amixedobjective function that tries to fit the labeledexamples while discriminating unlabeled examplesfrom their neighborhoods.

      Interesting - a mixed objective function -> this seems like a multi-task framework!

      --> Re-read and understand

    3. We have presentedcontrastive estimation, a newprobabilistic estimation criterion that forces a modelto explain why the given training data were betterthan bad data implied by the positive examples.

      This is again an interesting way to see it: "... forces a model to explain why the given training data were better than bad data implied by the positive examples."

    4. Viewed as a CE method, this approach (though ef-fective when there are few hypotheses) seems mis-guided; the objective says to move mass to each ex-ample at the expense of all other training examples

      A very cool remark and makes sense!!

    5. An alternative is to restrict theneighborhood to the set of observed training exam-ples rather than all possible examples (Riezler, 1999;Johnson et al., 1999; Riezler et al., 2000):

      This equation is reminiscent of the equation proposed by Nickel et al., 2017 - the Poincare Embeddings paper. Especially, look for Negative Sampling.