73 Matching Annotations
  1. Apr 2018
    1. Recent successful semi-supervised systems (Andoand Zhang, 2005; Suzuki and Isozaki, 2008)

      cite

  2. Mar 2018
  3. Jan 2018
    1. or each mentionm, we generate a set ofcandidate entitiesCm=fcjg  Eusing Cross-Wikis (Spitkovsky and Chang, 2012), a dictionarycomputed from a Google crawl of the web thatstores the frequency with which a mention linksto a particular entity. To generateCmwe choosethe top30entities for each mention string, andnormalize this frequency across the chosen candi-dates to computePprior(ejm).

      Use cross-wikis, a precomputed dictionary for links of mentions to wiki pages and their frequencies.

    2. Given a document, and mentions marked in it fordisambiguation, we perform a two-step procedureto link them to an entity. First, we find a set ofcandidate entities, and theirpriorscores using apre-computed dictionary.

      Get the subset of potentially linkable entities using a pre-computed dictionary -- no explanation of how they do this. Each entry in this dictionary has a prior probability

    3. Some models ignore the entity’s description onWikipedia, but rather, only rely on the context fromlinks to learn entity representations (Lazic et al.,2015), or use a pipeline of existing annotators tofilter entity candidates (Ling et al., 2015)

      cite

    4. By only making use of indirect su-pervision that is available in Wikipedia/Freebase,we refrain from using domain specific training data,and produce a domain-independent linking system.

      Is this really true, or are all of the domains just basically news/wiki text...

    1. AQUAINTcorpus of newswire text that is annotatedto mimic the hyperlink structure in Wikipedia. Thatis, only the first mentions of “important” titles werehyperlinked. Titles deemed uninteresting and re-dundant mentions of the same title are not linked.

      Wiki links structure only annotated first mentions which are deemed "interesting/relevant" to the title entity.

    2. We present an error analysis and identify the key re-maining challenge: determining when mentions re-fer to conceptsnotcaptured in Wikipedia.

      main challenge

    1. chris potts semantics and pragmatics course. great material on advanced formalisms of meaning and intention in language

    1. Barwise, Jon & Robin Cooper. 1981. Generalized quantifiers and natural language.Linguistics and Philosophy4(4). 159–219.van Benthem, Johan & Alice ter Meulen (eds.). 1997.Handbook of logic and language. Cambridge, MA and Amsterdam: MIT Press and North-Holland.Carlson, Gregory. 1977.Reference to kinds in English. Amherst, MA: UMass Amherst dissertation.Carpenter, Bob. 1997.Type-logical semantics. Cambridge, MA: MIT Press.Dowty, David. 2007. Compositionality as an empirical problem. In Chris Barker & Pauline Jacobson (eds.),Direct compositionality, 23–101. Oxford: Oxford University Press.Fodor, Jerry A. 1975.The language of thought. New York: Thomas A. Crowell Co.Frege, Gottlob. 1892/1980. On sense and reference. In Peter Geach & Max Black (eds.),Translations from the philosophical writings of Gottlob Frege, 56–78. Oxford: Blackwell 3rd edn.Jackendoff, Ray S. 1992.Languages of the mind. Cambridge, MA: MIT Press.Jackendoff, Ray S. 1996. Semantics and cognition. In Lappin (1996) 539–559.Jackendoff, Ray S. 1997.The architecture of the language faculty. Cambridge, Massachusetts: The MIT Press.Janssen, Theo M. V. 1997. Compositionality. In van Benthem & ter Meulen (1997) 417–473.Katz, Fred M. & Jeffrey J. Katz. 1977. Is necessity the mother of intension?The Philosophical Review86(1). 70–96.Katz, Jerrold J. 1972.Semantic theory. New York: Harper & Row.Katz, Jerrold J. 1996. Semantics in linguistics and philosophy: An intensionalist perspective. In Lappin (1996) 599–616.Katz, Jerrold J. & Paul M. Postal. 1964.An integrated theory of linguistic descriptions. Cambridge, MA: MIT Press.Lappin, Shalom (ed.). 1996.The handbook of contemporary semantic theory. Oxford: Blackwell Publishers.Lepore, Ernest. 1983. What model theoretic semantics cannot do.Synthese54(2). 167–187.Levin, Beth. 1993.English verb classes and alternations: A preliminary investigation. Chicago: Chicago University Press.Levin, Beth & Malka Rappaport Hovav. 1995.Unaccusativity: At the syntax–lexical semantics interface. Cambridge, MA: MIT Press.Levin, Beth & Malka Rappaport Hovav. 2005.Argument realization. Cambridge: Cambridge University Press.Lewis, David. 1969.Convention. Cambridge, MA: Harvard University Press. Reprinted 2002 by Blackwell.Lewis, David. 1970. General semantics.Synthese22(1). 18–67.Lewis, David. 1975. Languages and language. In Keith Gunderson (ed.),Minnesota studies in the philosophy of science, vol. VII, 3–35. Minneapolis: University of Minnesota Press. Reprinted in Lewis1983, 163–188. Page references are to the reprinting.Lewis, David. 1983.Philosophical papers, vol. 1. New York: Oxford University Press.Montague, Richard. 1970a. English as a formal language. In Bruno Visentini et al. (ed.),Linguaggi nella società e nella tecnica, 189–224. Milan: Edizioni di Communità. Reprinted in Montague(1974), 188–221.Montague, Richard. 1970b. Universal grammar.Theoria36. 373–398. Reprinted in Montague (1974), 222–246.Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Jaakko Hintikka, Julius Matthew Emil Moravcisk & Patrick Suppes (eds.),Approaches to natural language,221–242. Dordrecht: D. Reidel. Reprinted in Montague (1974), 247–270.Montague, Richard. 1974.Formal philosophy: Selected papers of Richard Montague. New Haven, CT: Yale University Press.Partee, Barbara H. 1980. Semantics – mathematics or psychology? In Egli Bäuerle & Arnim von Stechow (eds.),Semantics from different points of view, 1–14. Berlin: Springer-Verlag.Partee, Barbara H. 1981. Montague grammar, mental representations, and reality. In Stig Kanger & Sven Öhman (eds.),Philosophy and grammar, 59–78. Dordrecht: D. Reidel.Partee, Barbara H. 1984. Compositionality. In Fred Landman & Frank Veltman (eds.),Varieties of formal semantics, 281–311. Dordrecht: Foris. Reprinted in Barbara H. Partee (2004)Compositionalityin formal semantics, Oxford: Blackwell 153–181. Page references to the reprinting.Partee, Barbara H. 1996. The development of formal semantics in linguistic theory. In Lappin (1996) 11–38.Partee, Barbara H. 1997. Montague semantics. In van Benthem & ter Meulen (1997) 5–91.Szabó, Zoltán Gendler. 2012. Compositionality. In Edward N. Zalta (ed.),The Stanford encyclopedia of philosophy, CSLI winter 2012 edn.http://plato.stanford.edu/archives/win2012/entries/compositionality/.Thomason, Richmond H. 1974. Introduction. In Montague (1974) 1–69.Zimmermann, Thomas Ede. 1999. Meaning postulates and the model-theoretic approach to natural language semantics.Linguistics and Philosophy22(5). 529–561

      Reading list from chirs potts on compositional semantics

    1. Vagueness is the rule innatural language, not the exception, and it is arguably crucial for the flexible, expressive nature ofsuch languages, allowing fixed expressions to make different distinctions in different contexts andhelping people communicate under uncertainty about the world (Kamp & Partee 1995, Graff2000)

      cite. reading list

    2. In that case, the denotationsmight be events in the physical world, symbolic representations in a lower-level language, or evensymbolic representations in the same language used for logical forms, thereby blurring the dis-tinction between representation and denotation (Katz 1972, 1996; Kripke 1975; Chierchia &Turner 1988, p. 272)

      cite reading list

    3. questions about the nature of mental representation (Partee 1980, 1981; Jackendoff 1992,1997)

      cite. reading list

    4. for discussions ofhow to characterize and learn scopal preferencesfrom data, see Higgins & Sadock 2003, AnderBois et al. 2012, and Liang et al. 2013

      cite

    1. sets, properties, and relations may be regarded as particular kinds of functions

      This means that classes, which are equivalently defined by sets, are edges (not nodes) in an ontology

    1. let P(x)P(x)P(x) be the property x∉xx∉xx \not\in x. By the axiom of comprehension, there is a set RRR such that for any x,x∈Rx,x∈Rx, x \in R iff x∉xx∉xx \not\in x. But it follows immediately that R∈RR∈RR \in R iff R∉RR∉RR \not\in R, which is a contradiction

      the barbers paradox

    2. reading for philosophy of mathematics

    1. Three major biomedical informatics terminology bases, SNOMED CT, GALEN, and GO

      Bio KBs

    1. more generally, a partially ordered set

      aka a DAG

    2. According to an extensional definition, they are abstract groups, sets, or collections of objects. According to an intensional definition, they are abstract objects that are defined by values of aspects that are constraints for being member of the class

      I prefer the intentional definition. To be a member of a class, instances must satisfy certain constraints. Thus classes are denoted by shared properties of instances. Ie, intentional classes define necessary and sufficient conditions for membership

  4. Dec 2017
  5. Nov 2017
    1. Secondly,to ensure that we can sample from any point ofthe latent space and still generate valid and diverseoutputs, the posteriorq(zjx)is regularized withits KL divergence from a prior distributionp(z)

      The motivation is actually the opposite. The KL false out of the VI objective, which turns out to be the regularizer, which leads to this property.

    2. This choice of architecture helps to gainmore control over the KL term, which is crucialfor training a VAE model.

      why?

    1. The peakiness of the Concrete distribution increases withn, so much higher temperatures aretolerated (usually necessary)

      But often much higher temps are necessary

    2. at temperatures lower than(n1)1we are guaranteed not to have any modes in the interior for any2(0;1)n

      if the concrete is used for a distribution over N possible actions, a temp of 1/(n-1) is guaranteed to not learn an interior mode. This is important

    3. Therefore Proposition 1(d) is a conservative guideline for genericn-ary Concrete relaxations; at temperatures lower than(n1)1we are guaranteed not to have any modes in the interior for any2(0;1)n. We discussthe subtleties of choosing the temperatures in more detail in Appendix C. Ultimately the best choiceofand the performance of the relaxation for any specificnwill be an empirical question

      There is a subtle question about how to choose the relaxation temperature so that the relaxed graph does not learn to prefer non-discrete observations

    4. When it is computationally feasible to integrate overthe discreteness, that will always be a better choice.

      Always marginalize over discrete sample when possible

    5. The multi-sample variational objective (Burda et al., 2016

      cite. importance weighted vae

    6. Deter-ministic discreteness can be relaxed and approximated reasonably well with sigmoidal functions orthe softmax (see e.g., Grefenstette et al., 2015; Graves et al., 2016

      non-stochastic discrete relaxations w/ softmax

  6. arxiv.org arxiv.org
    1. may change the class of functions rep-resentable by the model

      why? this is not obvious at all...

    1. Note that maximizing the regularized conditional likelihood is not equivalent tomaximum a posteriori.Rather, it is similar to maximization of the pseudo-likelihood in conditionally specified models

      why is this?

  7. Oct 2017
    1. Bradbury et al. (2016) who in-troduce recurrent pooling between a succession of convo-lutional layers or Kalchbrenner et al. (2016) who tackleneural translation without attention. However, none ofthese approaches has been demonstrated improvementsover state of the art results on large benchmark datasets

      salesforce qrnn doesn't actually achieve state of the art

    1. Zhang and Wallace [2015] analyzed the stability of convolutional neural networks for sentenceclassification with respect to a large set of hyperparameters, and found a set of six which they claimedhad the largest impact: the number of kernels, the difference in size between the kernels, the sizeof each kernel, dropout, regularization strength, and the number of filters

      cite. paper explores many cnn settings for sentence classification

    1. Hi Mike,

      This viewer should allow you to view my annotations of this page. I have annotated the subsets of dbpedia (links to annotations at the bottom) which most closely correspond to semantic relations. (These are the ones I'm using for experiments.)

      You can hover over the dataset names to get a description of the dataset, and click the '?' next to 'ttl' under the 'en' column to get a link to a preview of the raw data I'm parsing.

      Hope this helps! Cheers, Tom

      (Note that instead of clicking the links below, you can click the "Show All" button above and then click the "Annotations" tab to see the notes side by side. Clicking them will scroll to them on the page.)

    2. Person data

      personal info about wikipedia entities

    3. Genders

      genders of wikipedia entities

    4. Page Ids

      how I get the links to the wikipedia articles for dbpedia entities

    5. Mappingbased Objects

      wikipedia entities and their relations to other entities

    6. Mappingbased Literals

      wikipedia entities and their literal properties

  8. Sep 2017
    1. We employ a residual connection [11] around each ofthe two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer isLayerNorm(x+ Sublayer(x))

      cit.e [11] is the residual connection work and [1] is layer normalization

    2. elf-attention, sometimes called intra-attention is an attention mechanism relating different positionsof a single sequence in order to compute a representation of the sequence. Self-attention has beenused successfully in a variety of tasks including reading comprehension, abstractive summarization,textual entailment and learning task-independent sentence representations [4, 26, 27, 21]

      cite. self attention is a very important mechanism in nlu tasks

    3. n these models,the number of operations required to relate signals from two arbitrary input or output positions growsin the distance between positions, linearly for ConvS2S and logarithmically for ByteNet.

      only if you used full (nondilated or dynamic) convolutions

    4. tructured attention networks

      cite reading list

    5. A decomposable attentionmodel. InEmpirical Methods in Natural Language Processing, 2016

      cite reading list

    6. [36, 23, 15]

      cite. seq2seq sota

    7. Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networksin particular, have been firmly established as state of the art approaches in sequence modeling andtransduction problems such as language modeling and machine translation [33,2,5]

      cite. papers that reach sota in sequence transduction problems

    1. Whenxis observed, we addi-tionally minimise the prediction loss onx, expressedas the negative log-likelihoodNLL(~x;x)of the truelabelsxrelative to the predictions~x, and do not im-pose the KL loss. The supervised loss is thus

      Why do they not impose the KL loss?

    2. t training time,rather than^yt1we feed the ground truthyt1

      wonder if teacher forcing causes unrealistic failure modes

    1. This is of-ten a highly restrictive and unrealistic assumptionto impose on the structure of the latent variables

      truf

    1. Until now, Hypothes.is has used the anchoring strategy inherited from the Annotator project, which anchors annotations to their targets by saving exact locations in the form of XPath range descriptions to the involved DOM elements and the string offsets inside them. When the anchor needs to be located again, the DOM elements are found by using the same XPath expressions. This method works if the content is stable, but is vulnerable to changes to the structure of the page that render stored XPaths invalid. Also, this approach doesn’t facilitate cross-format annotation

      A simple way to do anchor -- assumes content stability

  9. Aug 2017
    1. However, this approach is not taken in practice.

      Why not?

  10. Apr 2017
    1. 5.1 Annotation Collection As Annotation Collections might get very large, the model distinguishes between the Collection itself and sequence of component pages that in turn list the Annotations. The Collection maintains information about itself, including creation or descriptive information to aid with discovery and understanding of the Collection, and also references to at least the first Page of Annotations. By starting with the first Annotation in the first Page, and traversing the Pages to the last Annotation of the last Page, all Annotations in the Collection will have been discovered. Annotations MAY be within multiple Collections at the same time, and the Collection MAY be created or maintained by agents other than those that create or maintain the included Annotations.

      This formulation of the indexing data structure seems unneccesarily strict. Why prescribe a doubly linked list data structure for traversing a large set of annotations? It would make more sense to allow for an arbitrary resource indexing structure, such as a B+Tree, if the collection cardinality grows extensively

    1. his kind of winds up being like this sort of Srirachaof NLP; it is not really fundamental to solving anyproblem, but for almost any problem in language un-derstanding, it gives you a substantial boost in perfor-mance. You can do noticeably better by using wordembeddings than you can do without

      HIlarious. Love this analogy

    1. This would be really cool to do with math equations, recursively uncovering the more wordy definitions of symbols

    1. Theorem 2 (Universality)A functionS(X)operating ona setXcan be a valid scoring function, i.e. it is permu-tation invariant to the elements inX, if and only if it canbe decomposed in the formPx2X(x), for suitabletransformationsand.Proof sketch.Permutation invariance follows from the factthat sets have no particular order, hence any function on a setmust not exploit any particular order either. The sufficiencyfollows by observing that the functionPx2X(x)sat-isfies the permutation invariance condition.To prove necessity, i.e. that all functions can be representedin this manner, note that polynomials are universalapproximators. Hence it suffices if we prove the result forpolynomials. In this case the Chevalley-Shephard-Todd(CST) theorem (Bourbaki, 1990, chap. V, theorem 4), ormore precisely, its special case, the Fundamental Theoremof Symmetric Functions states that symmetric polynomialsare given by a polynomial of homogeneous symmetricmonomials. The latter are given by the sum over monomialterms, which is all that we need since it implies that allsymmetric polynomials can be written in the form requiredby the theorem

      I understand why it's sufficient. What isn't clear to me is why it's necessary. Why can't any commutative composition function work, such as (element-wise) products instead of sums

    1. In the process of doing this, researchers introduce dozens of confounding variables, which essentially make the comparisons meaningless

      Ridiculously true

    1. Hierarchies on variances: this is a natural step and is used in many Bayesian models where learning variances is involved. This does raise interesting research questions as to what assumptions and distributions to use beyond the simple one-parameter exponential families that are widely used

      References for these?

    2. The conclusion from this is that one way to view deep feed-forward networks are as hierarchical probabilistic models. What is important though, is that this hierarchy is a hierarchy formed through the means of the layer-distributions. Only the mean parameters at every layer of the hierarchy depend on computations from previous parts of the hierarchy, i.e. hierarchies whose dependency is through the first-order structure at every layer of the model.

      Why are we only concerned with hierarchy formed through the means, and not using the layer-wise distributional uncertainties (or the sufficient statistics for that matter)?

      I guess when we use a bernoulli, the variance is already specified by the mean.

    1. Unlike a highlight, an annotation, or a page note, a reply doesn’t refer to an annotated document. Instead it refers to one of those annotation types, or to a prior reply. You use the Reply link to create a reply

      My only question with replies is, why cant I reply to a specific span in the annotation, like is doable in the document?

    2. Fund: On-Demand Web Archiving of Annotated Pages

      Did this work? I think pinboard has a robust method for archiving pages.

    3. Hypothesis is working with partners like eLife to integrate Hypothesis into the peer review process

      Any plans or projects for integrating this with arXiv?

    4. experts extract structured knowledge from the literature

      Are there any references to specific projects that have done this?

    1. Really cool venue for publishing online, interactive articles for ML

    1. word2vec is not the best model for this. Multi-class regression should work well, and I added a working demo of this to the repo. This is a rare case where the vocabulary size (number of ingredients) is very small, so we can fit both models and compare them. This could reveal idiosyncrasies in the non-contrastive estimation loss used in word2vec and provides an interesting testbed.

      So is the point that using negative sampling is too simple an approximation?

    2. consume news through Facebook

      Not necessarily in a good way...

    1. if your goal is word representation learning,you should consider both NCE and negative sampling

      Wonder if anyone has compared these two approaches

    1. What technology does the archive use? The archiving system fetches links using an enhanced version of wget, with a little extra intelligence about fetching dependencies. Every crawled page gets stored in a single directory, and the links rewritten to point to the local copy.

      Simple explanation

    1. People stink at tagging. They often forget.

      I don't think it's just forgetfulness. Tagging requires exact resolution of an idea to a tag. But when doing exploratory reading, deciding on the correct tag to use is often difficult and time-consuming (and rarely consistent across time).

      A tag recommender would be a cool extension.

    2. , use markdown

      Wait you can use markdown!

      I guess it does render kind of weird though

      But maybe that's just for h1s... na all of the h*s