Hypothesis

31 Matching Annotations

Dec 2016
www.anthology.aclweb.org www.anthology.aclweb.org

factored-induction-camera.dvi

6
1. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  The product, like the CCM itself, is mass-deficient
  
  I.E. it assigns some probability to impossible outcomes.
2. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  Initialization is important to the success of anylocal search procedure. We chose to initialize EMnot with an initial model, but with an initial guessat posterior distributions over dependency structures(completions). For the first-round, we constructeda somewhat ad-hoc “harmonic” completion
  
  This is very important to make the whole thing work and there has actually been some follow up research that attempts to get rid of the dependence on this specific initialization.
3. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  The basic inside-outside algorithm (Baker, 1979)can be used for re-estimation.
  
  By using the mapping from dependency trees to constituency trees.
4. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  butlower than simply linking all adjacent words into aleft-headed (and right-branching) structure (53.2%)
  
  Which is a strong baseline for English, it would be interesting to see how this compares on other languages.
5. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  Most recent progress in unsupervised parsing hascome from tree or phrase-structure grammar basedmodels
  
  True then, not necessarily true anymore these days.
6. Christoph_Teichmann 02 Dec 2016
  
  in Public
  
  hence un-dermines arguments based on “the poverty of thestimulus”
  
  Although the learning algorithm already encodes certain biases about what linguistic structure should look like i.e. in the form of the grammar they consider and Chomsky had some specific complex examples and did not claim that it way impossible to learn anything.
Visit annotations in context

Annotators

Christoph_Teichmann

URL

anthology.aclweb.org/P/P04/P04-1061.pdf
Nov 2016
www.aclweb.org www.aclweb.org

Using Universal Linguistic Knowledge to Guide Grammar Induction

7
1. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  This result suggests that some rules are harder tolearn than others regardless of their frequency, sotheir presence in the specified ruleset yields strongerperformance gains.
  
  It could also be the case the the trees inferred without these restrictions just differ in a more complex way from the gold trees.
2. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  Variational approximations to the HDP are trun-cated at 1
  
  So you could also just assume a parametric version where you just have 10 refinement labels and a simpler presentation.
3. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  Collins head finding rules
  
  Probably necessary for the Collins rules to work properly in inference.
4. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  all mass is concentrated on some singleβ∗
  
  At this point you could just do approximate MAP inference and the presentation would become less complicated.
5. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  explicitly generate wordsxrather than only part-of-speech tagss
  
  But since both the part-of-speech tags and the words are observed an the dependency tree structure in their reduced model is independent of the words given the tags, this makes no difference compared to the original models.
6. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  Root→Verb
  
  In my experience the rules that ensure that the root has a verb or an auxiliary as a child are the most important ones. Most learning algorithms seem to have a real problem with making the first noun in the sentence the head.
7. Christoph_Teichmann 29 Nov 2016
  
  in Public
  
  Furthermore, these universal rules arecompact and well-understood, making them easy tomanually construct.
  
  But still dependent on knowing gold part-of-speech tags.
Visit annotations in context

Annotators

Christoph_Teichmann

URL

aclweb.org/anthology/D10-1120
www.aclweb.org www.aclweb.org

Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

8
1. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  our models may be improved by the applicationof (unsupervised) discriminative learning techniqueswith features (Berg-Kirkpatrick et al., 2010); or byincorporating topic models and document informa-tion (Griffiths et al., 2005; Moon et al., 2010)
  
  It might also be interesting to explore the use of other information sources such as the HTML annotation used by Spitkovsky and colleagues
2. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  After chunking is performed, all multiword con-stituents are collapsed and represented by a singlepseudoword
  
  A slightly "fancier" approach would be to cluster the low level constituents.
3. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  The fact that the HMM and PRLG havehigher recall on NP identification on Negra than pre-cision is further evidence towards this.
  
  One thing that can be taken away from this is the fact that the model seems to identify relatively short sequences as chunks.
4. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  ossible tag transitions as a state diagram
  
  This diagramm is key to their approach. B will with high likelyhood generate words that are likely to be followed by non-stop words. I does the same for words that are likely to follow non-stop words.
5. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  they im-pose a hard constraint on constituent spans, in thatno constituent (other than sentence root)
  
  This heuristic is not entirely true:
  
  Ich denke, [ dass Marie denkt, [ dass John nicht nachgedacht hat] ]
  
  but it is a useful heuristic much like assuming one tag per word type is helpful in unsupervised part-of-speech tagging.
6. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  As in previouswork, punctuation is not used for evaluation
  
  I.e. punctuation is used as an information source during training, but then discarded before evaluation.
7. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  ef-fectiveness at identifying low-level constituent
  
  This seems to be true for all techniques that exist for unsupervised parsing to some extend. There does not seem to be a really good technique for discovering more "high level" constituents.
8. Christoph_Teichmann 25 Nov 2016
  
  in Public
  
  low-level constituents based solely on wordco-occurrence frequencie
  
  This connects directly to Hänig et. al.'s paper.
Visit annotations in context

Annotators

Christoph_Teichmann

URL

aclweb.org/anthology/P11-1108
asv.informatik.uni-leipzig.de asv.informatik.uni-leipzig.de

HaenigBordagQuasthoffFinal

2
1. Christoph_Teichmann 20 Nov 2016
  
  in Public
  
  pparently the variable c already suffices to produce very competitive results, assuming POS tagswere used
  
  Suggesting that significant co-occurrence is already a good indicator of constituency.
2. Christoph_Teichmann 20 Nov 2016
  
  in Public
  
  our algorithm significantly outperforms the random branching baseline
  
  Note that the right branching baseline is much stronger.
Visit annotations in context

Annotators

Christoph_Teichmann

URL

asv.informatik.uni-leipzig.de/publication/file/132/lrec_unsuparse.pdf
www.aclweb.org www.aclweb.org

A Bayesian Mixture Model for PoS Induction Using Multiple Features

4
1. Christoph_Teichmann 11 Nov 2016
  
  in Public
  
  The secondbaseline is a more recent vector-based syntactic classinduction method, the SVD approach of (Lamar etal., 2010), which extends Sch ̈utze (1995)’s originalmethod and, like ours, enforces a one-class-per-tagrestriction
  
  Note that all the comparison Systems are type based.
2. Christoph_Teichmann 11 Nov 2016
  
  in Public
  
  Note that this model with multiple context fea-tures is deficient: it can generate data that are in-consistent with any actual corpus, because there isno mechanism to constrain the left context wordof tokeneito be the same as the right contextword of tokenei−1(and similarly with alignmentfeatures)
  
  Here the independet generation of words comes back to be a problem.
3. Christoph_Teichmann 11 Nov 2016
  
  in Public
  
  type leve
  
  Like Capitalization
4. Christoph_Teichmann 11 Nov 2016
  
  in Public
  
  choose a class assignmentzjfrom the distributionθ. For each classi= 1. . . Z,choose an output distribution over featuresφi. Fi-nally, for each tokenk= 1. . . njof word typej,generate a featurefjkfromφzj, the distribution as-sociated with the class that word typejis assignedto.
  
  Note that features and words in the generative story are generated independently of each other. Meaning there is no need in the generative model for counts of features to be "consistent".
Visit annotations in context

Annotators

Christoph_Teichmann

URL

aclweb.org/website/old_anthology/D/D11/D11-1059.pdf
www.aclweb.org www.aclweb.org

Two Decades of Unsupervised POS Induction: How Far Have We Come?

4
1. Christoph_Teichmann 04 Nov 2016
  
  in Public
  
  induction system allow it to produce useful proto-types with the current method and/or to develop aspecialized system specifically targeted towards in-ducing useful prototypes
  
  Maybe Clark's method does not benefit as much because it already uses Morphological information?
2. Christoph_Teichmann 04 Nov 2016
  
  in Public
  
  We note that the two best-performing systems,clarkandfeat, are also the only two to use mor-phological informatio
  
  This reduces the information that has to come from distributional data. Which should be especially important for languages with more complex morphology and less information expressed via word order. But we later see that there is not that much of a difference.
3. Christoph_Teichmann 04 Nov 2016
  
  in Public
  
  To address thisquestion, we evaluate all the systems as well on themultilingual Multext East corpus
  
  Still limited to European languages.
4. Christoph_Teichmann 04 Nov 2016
  
  in Public
  
  This is probably due tothe fact that the gold standard encodes “pure” syn-tactic classes, while substitutability also depends onsemantic characteristics (which tend to be picked upby unsupervised clustering systems as well
  
  This is actually a key "problem" in unsupervised part-of-speech induction - the induced POS pick up on semantic distinction which lead, among other things, to distinctions between nouns that are more fine grained than what one would expect according to standard syntactic theory.
Visit annotations in context

Annotators

Christoph_Teichmann

URL

aclweb.org/anthology/D10-1056

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL