9 Matching Annotations

Dec 2019
nlpoverview.com nlpoverview.com

Modern Deep Learning Techniques Applied to Natural Language Processing by Authors

8
1. vitalwarley 29 Dec 2019
  
  in Public
  
  The quality of word representations is generally gauged by its ability to encode syntactical information and handle polysemic behavior (or word senses). These properties result in improved semantic word representations. Recent approaches in this area encode such information into its embeddings by leveraging the context. These methods provide deeper networks that calculate word representations as a function of its context.
  
  Syntactical information
  
  Polysemic behavior (word senses)
  
  Semantic word representations
  
  Entendo que lidar com word senses significa dizer que a representação das palavras consegue medidas similares para palavras similares.
  
  O que seria informação sintática? E sua relação com representações semânticas da palavra?
  
  embeddings nlp
2. vitalwarley 28 Dec 2019
  
  in Public
  
  Traditional word embedding algorithms assign a distinct vector to each word. This makes them unable to account for polysemy. In a recent work, Upadhyay et al. (2017) provided an innovative way to address this deficit. The authors leveraged multilingual parallel data to learn multi-sense word embeddings.
  
  multilingual parallel data
  
  multi-sense word embeddings
  
  embeddings word2vec
3. vitalwarley 28 Dec 2019
  
  in Public
  
  This is very important as training embeddings from scratch requires large amount of time and resource. Mikolov et al. (2013) tried to address this issue by proposing negative sampling which is nothing but frequency-based sampling of negative terms while training the word2vec model.
  
  Amostragem negativa... termos negativos?
  
  word2vec embeddings
4. vitalwarley 28 Dec 2019
  
  in Public
  
  A general caveat for word embeddings is that they are highly dependent on the applications in which it is used. Labutov and Lipson (2013) proposed task specific embeddings which retrain the word embeddings to align them in the current task space.
  
  Acredito que aplicação aqui se relaciona com contexto, logo word embeddings são dependentes de contexto. Isso é bem óbvio, a princípio. Seria isso o que o autor quis dizer?
  
  Retreinar as incorporações para alinhar à tarefa corrente. Alinhar seria nada mais do que adequar as incorporações prévias no novo contexto, é isso?
  
  word2vec embeddings
5. vitalwarley 28 Dec 2019
  
  in Public
  
  One solution to this problem, as explored by Mikolov et al. (2013), is to identify such phrases based on word co-occurrence and train embeddings for them separately. More recent methods have explored directly learning n-gram embeddings from unlabeled data (Johnson and Zhang, 2015).
  
  Co-ocorrência de palavras eu consigo entender, mas treinar as embeddings separadamente não. Seria supor a co-ocorrência das palavras como unidade na incorporação, em vez da palavra apenas?
  
  embeddings nlp word2vec
6. vitalwarley 28 Dec 2019
  
  in Public
  
  bigram language model.
  
  Não sei, ainda, o que significa:
  
  bigram
  
  language model
7. vitalwarley 28 Dec 2019
  
  in Public
  
  The context words are assumed to be located symmetrically to the target words within a distance equal to the window size in both directions.
  
  O que significa dizer "simetricamente localizadas" as palavras alvo?
  
  word2vec nlp cbow skip-gram
8. vitalwarley 28 Dec 2019
  
  in Public
  
  This led to the motivation of learning distributed representations of words existing in low-dimensional space (Bengio et al., 2003).
  
  Sobre maldição da dimensionalidade. Agora, o que seria representações distribuídas das palavras em espaços de menor dimensão? Isso me lembra de PCA e afins.
  
  distributed representation nlp
Visit annotations in context

Tags

distributed representation

word2vec

skip-gram

nlp

cbow

embeddings

Annotators

vitalwarley

URL

nlpoverview.com/
academia.hypotheses.org academia.hypotheses.org

Open Data Citation for Social Sciences and Humanities – The companion blog to the Humanities at Scale Winter School in Prague: 24th-28th October 2016

1
1. vitalwarley 28 Dec 2019
  
  in Public
  
  The word vector is the arrow from the point where all three axes intersect to the end point defined by the coordinates.
  
  The three axes gives each one a context.
  
  nlp embeddings
Visit annotations in context

Tags

nlp

embeddings

Annotators

vitalwarley

URL

academia.hypotheses.org/58766

vitalwarley

Annotations: 9

Joined: December 28, 2019

Tags

Annotators

URL

Tags

Annotators

URL