378 Matching Annotations

Oct 2023
github.com github.com

jsonata-js/jsonata: JSONata query and transformation language - http://jsonata.org

1
1. ildar 31 Oct 2023
  
  in Public
  
  Basic Selection
Visit annotations in context

Annotators

ildar

URL

github.com/jsonata-js/jsonata/blob/master/tutorial.md
github.com github.com

jneen/parsimmon: A monadic LL(infinity) parser combinator library for javascript

1
1. ildar 30 Oct 2023
  
  in Public
  
  See the examples directory for annotated examples of parsing JSON, Lisp, a Python-ish language, and math.
  
  test
Visit annotations in context

Annotators

ildar

URL

github.com/jneen/parsimmon
nightingaledvs.com nightingaledvs.com

Why I Stopped Using Bullet Graphs (and What I Now Use Instead), Nightingale

1
1. ildar 30 Oct 2023
  
  in Public
  
  11 downsides of bullet graphs that I think action dots avoid
Visit annotations in context

Annotators

ildar

URL

nightingaledvs.com/why-i-stopped-using-bullet-graphs-and-what-i-now-use-instead/
Oct 2020
d4112c.web.app d4112c.web.app

1996 Дюркгейм - Предисловие, введение и глава 1.pdf

8
1. ildar 22 Oct 2020
  
  in Public
  
  Понимаемая таким образом, эта наука не находится в противоречии
2. ildar 22 Oct 2020
  
  in Public
  
  Но если свобода воли действительно
  
  привет!
3. ildar 22 Oct 2020
  
  in Public
  
  Книга эта - прежде всего попытка исследовать факты
4. ildar 22 Oct 2020
  
  in Public
  
  Но если свобода воли действительно
  
  привет!
5. ildar 22 Oct 2020
  
  in Public
  
  Книга эта - прежде всего попытка исследовать факты
6. ildar 22 Oct 2020
  
  in Public
  
  Понимаемая таким образом, эта наука не находится в противоречии ни с каким видом философии, ибо она переносится на совсем другую почву. Возможно, нрав- 1 Нас упрекают (см. Beudant. Le droit individнel et l'Etat, р. 244) в том, что мы назвали в одном месте вопрос о свободе воли тонким. Это выражение не Имело в наших устах ничего презрительного. Если мы отказынаемся от Решения этой проблемы, то исключительно nотому, что ее решение, каким бы оно ни было, не может препятствовать нашему исследованию.
7. ildar 22 Oct 2020
  
  in Public
  
  Книга эта - прежде всего попытка исследовать
8. ildar 22 Oct 2020
  
  in Public
  
  Книга эта - прежде всего попытка исследовать
Visit annotations in context

Annotators

ildar

URL

d4112c.web.app/readings/4.2/1996 Дюркгейм - Предисловие, введение и глава 1.pdf
d4112c.web.app d4112c.web.app

title of the web page

6
1. ildar 22 Oct 2020
  
  in Public
  
  Книга эта - прежде всего попытка исследовать
  
  ggggg
2. ildar 22 Oct 2020
  
  in Public
  
  ggggg
3. ildar 20 Oct 2020
  
  in Public
  
  Книга эта -прежде всего попытка исследовать факты
  
  body of annotation, can include markup
4. ildar 20 Oct 2020
  
  in Public
  
  Книга эта -прежде всего попытка исследовать факты
  
  body of annotation, can include markup
5. ildar 20 Oct 2020
  
  in Public
  
  Книга эта -прежде всего попытка исследовать факты
  
  body of annotation, can include markup
6. ildar 20 Oct 2020
  
  in Public
  
  Книга эта -прежде всего попытка исследовать факты
  
  body of annotation, can include markup
Visit annotations in context

Annotators

ildar

URL

d4112c.web.app/pdf.js-hypothes.is/viewer/web/viewer.html
Local file Local file

latur.pdf

1
1. ildar 20 Oct 2020
  
  in Public
  
  Брюно Латур, Стивен Вулгар
  
  пинг
Annotators

ildar
Aug 2019
www.aclweb.org www.aclweb.org

A Fine-grained Large-scale Analysis of Coreference Projection

2
1. ildar 13 Aug 2019
  
  in Public
  
  our system is able to address zeroanaphora. Thorough cross-lingual analysis byNov ́ak and Nedoluzhko (2015) showed that manycounterparts of Czech or English coreferential ex-pressions are zeros. This likely holds for the otherpro-drop languages, too
2. ildar 13 Aug 2019
  
  in Public
  
  we divide mentions into multiple cate-gories in this paper: (1) personal pronouns, (2)possessive pronouns, (3) reflexive possessive pro-nouns, (4) reflexive pronouns, all four types ofpronouns in the 3rd or ambiguous person, (5)demonstrative pronouns, (6) zero subjects, (7) ze-ros in non-finite clauses, (8) relative pronouns, (9)the pronouns of types (1)-(4) in the 1st or 2nd per-son, (10) named entities, (11) common nominalgroups, and (12) other expressions.
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/W18-0709
Jul 2019
www.aclweb.org www.aclweb.org

A Fine-grained Large-scale Analysis of Coreference Projection

3
1. ildar 29 Jul 2019
  
  in Public
  
  Starting with a text inthe target language to be labeled with coreference,it first must be machine-translated to the sourcelanguage. A coreference resolver for the sourcelanguage is then applied on the translated text and,finally, the newly established coreference links areprojected back to the target language
2. ildar 29 Jul 2019
  
  in Public
  
  Approaches to cross-lingual projection are usu-ally aimed to bridge the gap of missing resourcesin the target language.
3. ildar 29 Jul 2019
  
  in Public
  
  MT-based approachesapply a machine-translation ser-vice to create synthetic data in source language.Corpus-based approachestake advantage of thehuman-translated parallel corpus of the two lan-guages.
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/W18-0709
web.stanford.edu web.stanford.edu

CS276: Information Retrieval and Web Search

1
1. ildar 27 Jul 2019
  
  in Public
  
  nlp ir @course
Visit annotations in context

Tags

ir

@course

nlp

Annotators

ildar

URL

web.stanford.edu/class/cs276/
warwick.ac.uk warwick.ac.uk

CS918 Natural Language Processing

1
1. ildar 26 Jul 2019
  
  in Public
  
  nlp @course
Visit annotations in context

Tags

@course

nlp

Annotators

ildar

URL

warwick.ac.uk/fac/sci/dcs/teaching/modules/cs918/
www.inf.ed.ac.uk www.inf.ed.ac.uk

School of Informatics Courses: Foundations of Natural Language Processing Home Page

1
1. ildar 26 Jul 2019
  
  in Public
  
  nlp @course
Visit annotations in context

Tags

@course

nlp

Annotators

ildar

URL

inf.ed.ac.uk/teaching/courses/fnlp/
www.cs.toronto.edu www.cs.toronto.edu

CSC421/2516 Winter 2019

1
1. ildar 17 Jul 2019
  
  in Public
  
  @course
Visit annotations in context

Tags

@course

Annotators

ildar

URL

cs.toronto.edu/~rgrosse/courses/csc421_2019/
Jun 2019
coling2018.org coling2018.org

Call for input: Paper types and associated review forms | COLING 2018

4
1. ildar 24 Jun 2019
  
  in Public
  
  Computationally-aided linguistic analysis The focus of this paper type is new linguistic insight.
2. ildar 24 Jun 2019
  
  in Public
  
  NLP engineering experiment paper This paper type matches the bulk of submissions at recent CL and NLP conferences.
3. ildar 24 Jun 2019
  
  in Public
  
  Reproduction paper The contribution of a reproduction paper lies in analyses of and in insights into existing methods and problems—plus the added certainty that comes with validating previous results.
4. ildar 24 Jun 2019
  
  in Public
  
  Resource paper Papers in this track present a new language resource. This could be a corpus, but also could be an annotation standard, tool, and so on.
Visit annotations in context

Annotators

ildar

URL

coling2018.org/call-for-input-paper-types-and-associated-review-forms/
burrsettles.com burrsettles.com

Active Learning Literature Survey

7
1. ildar 23 Jun 2019
  
  in Public
  
  Perhaps the simplest and most commonly used query framework isuncertaintysampling(Lewis and Gale, 1994). In this framework, an active learner queriesthe instances about which it is least certain how to label. This approach is oftenstraightforward for probabilistic learning models.
2. ildar 23 Jun 2019
  
  in Public
  
  The main difference between stream-based and pool-based active learning isthat the former scans through the data sequentially and makes query decisionsindividually, whereas the latter evaluates and ranks the entire collection beforeselecting the best query.
3. ildar 23 Jun 2019
  
  in Public
  
  For many real-world learning problems, large collections of unlabeled data can begathered at once. This motivatespool-based sampling(Lewis and Gale, 1994),which assumes that there is a small set of labeled dataLand a large pool of un-labeled dataUavailable.
4. ildar 23 Jun 2019
  
  in Public
  
  An alternative to synthesizing queries isselective sampling(Cohn et al., 1990,1994). The key assumption is that obtaining an unlabeled instance is free (or in-expensive), so it can first be sampled from the actual distribution, and then thelearner can decide whether or not to request its label.
5. ildar 23 Jun 2019
  
  in Public
  
  he active learner aims to achieve high accuracy usingas few labeled instances as possible, thereby minimizing the cost of obtaininglabeled data
6. ildar 23 Jun 2019
  
  in Public
  
  Active learning systems attempt to overcome the labeling bottleneck by askingqueriesin the form of unlabeled instances to be labeled by anoracle(e.g., a humanannotator)
7. ildar 23 Jun 2019
  
  in Public
  
  The key hypothesis is that if the learning algorithm isallowed to choose the data from which it learns—to be “curious,” if you will—itwill perform better with less training.
Visit annotations in context

Annotators

ildar

URL

burrsettles.com/pub/settles.activelearning.pdf
en.wikipedia.org en.wikipedia.org

Active learning (machine learning) - Wikipedia

1
1. ildar 23 Jun 2019
  
  in Public
  
  Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al.[9] propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning:[10] In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Membership Query Synthesis: This is where the learner generates its own instance from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if your dataset is small.[11] Pool-Based Sampling: In this scenario, instances are drawn from the entire data pool and assigned an informative score, a measurement of how well the learner “understands” the data. The system then selects the most informative instances and queries the teacher for the labels. Stream-Based Selective Sampling: Here, each unlabeled data point is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions:[12] When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal Predictors: This method predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction.[13]
  
  active learning
Visit annotations in context

Tags

active learning

Annotators

ildar

URL

en.wikipedia.org/wiki/Active_learning_(machine_learning)
active-learning.net active-learning.net

active-learning.net

1
1. ildar 23 Jun 2019
  
  in Public
  
  active learning
Visit annotations in context

Tags

active learning

Annotators

ildar

URL

active-learning.net/
www.d2l.ai www.d2l.ai

Dive into Deep Learning — Dive into Deep Learning 0.7 documentation

1
1. ildar 09 Jun 2019
  
  in Public
  
  @course deep learning
Visit annotations in context

Tags

@course

deep learning

Annotators

ildar

URL

d2l.ai/
May 2019
www.aclweb.org www.aclweb.org

A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

3
1. ildar 30 May 2019
  
  in Public
  
  The first NER task was organized by Grishman and Sundheim (1996) in the Sixth Message Under-standing Conference. Since then, there have been numerous NER tasks (Tjong Kim Sang and De Meul-der, 2003; Tjong Kim Sang, 2002; Piskorski et al., 2017; Segura Bedmar et al., 2013; Bossy et al., 2013;Uzuner et al., 2011).
2. ildar 30 May 2019
  
  in Public
  
  Starting with Collobert et al. (2011), neural network NERsystems with minimal feature engineering have become popular. Such models are appealing becausethey typically do not require domain specific resources like lexicons or ontologies, and are thus poised tobe more domain independent.
3. ildar 30 May 2019
  
  in Public
  
  Early NER systems were based on handcrafted rules, lexicons, orthographic fea-tures and ontologies. These systems were followed by NER systems based on feature-engineering andmachine learning (Nadeau and Sekine, 2007
  
  v
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/C18-1182
medium.com medium.com

Aspect-Based Opinion Mining (NLP with Python) – Peter Min – Medium

9
1. ildar 16 May 2019
  
  in Public
  
  define a set of rules to set the correct sentiment score to the opinion word
  
  sentiment "correction"
2. ildar 16 May 2019
  
  in Public
  
  the spaCy’s dependency parser is able to identify other dependency words linked to that particular opinion word. This allows you to extract the aspect term
  
  depedency tree-based aspect term extraction
3. ildar 16 May 2019
  
  in Public
  
  identify opinion words by cross referencing the opinion lexicon for negative and positive words
  
  lexicon-based opinion words extraction
4. ildar 16 May 2019
  
  in Public
  
  If the word fails to meet the threshold for the proximity in the two words in the vector space, the algorithm falls back on using the category of the entire sentence that was classified previously using ML-NB
5. ildar 16 May 2019
  
  in Public
  
  I first try to assign based on the similarity of the aspect term to the aspect category with word2vec’s n_similarity
6. ildar 16 May 2019
  
  in Public
  
  tag it with an aspect using a Multi-label Naive Bayes model
  
  multilabel classification
7. ildar 16 May 2019
  
  in Public
  
  segment the chunk of text into sentences
  
  sentence tokenize
8. ildar 16 May 2019
  
  in Public
  
  I first replace the pronouns in the sentence using a pre-trained neural coreference model;
  
  coref
9. ildar 16 May 2019
  
  in Public
  
  For example, let’s assume you’re trying to classify a single yelp restaurant review into one of five aspects: food, service, price, ambience, or simply anecdotal/miscellaneous.
  
  мультилейбл классификация
Visit annotations in context

Annotators

ildar

URL

medium.com/@pmin91/aspect-based-opinion-mining-nlp-with-python-a53eb4752800
arxiv.org arxiv.org

1708.00497

1
1. ildar 14 May 2019
  
  in Public
  
  chmoller et al. [21]carry out their analysis using a dataset provided by the car sharing operator,which contains more information than what is generally available to the researchcommunity at large.
Visit annotations in context

Annotators

ildar

URL

arxiv.org/pdf/1708.00497
www.scielo.org.mx www.scielo.org.mx

Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model

5
1. ildar 09 May 2019
  
  in Public
  
  There are various types of n-grams and syntactic n-grams according to types of elements they are built of: lexical units (words, stems, lemmas), POS tags, SR tags (names of syntactic relations), characters, etc.
2. ildar 09 May 2019
  
  in Public
  
  recently we have proposed a concept of syntactic n-grams, i.e., n-grams constructed by following paths in syntactic trees [19,21].
3. ildar 09 May 2019
  
  in Public
  
  The most widely used features are words and n-grams.
4. ildar 09 May 2019
  
  in Public
  
  by their nature the features have symbolic values, then they are mapped to numeric values in some manner.
5. ildar 09 May 2019
  
  in Public
  
  The most common manner to represent objects is the Vector Space Model (VSM) [17]. In this model, the objects are represented as vectors of values of features. The features characterize each object and have numeric values.
Visit annotations in context

Annotators

ildar

URL

scielo.org.mx/pdf/cys/v18n3/v18n3a7.pdf
www.cs.uml.edu www.cs.uml.edu

Anna Rumshisky

1
1. ildar 07 May 2019
  
  in Public
  
  nlp @reseacher
Visit annotations in context

Tags

nlp

@reseacher

Annotators

ildar

URL

cs.uml.edu/~arum/
www.aclweb.org www.aclweb.org

RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

8
1. ildar 07 May 2019
  
  in Public
  
  Additionally, we defined a subcategory of positive posts that covers frequent speech acts, such asexpressions of gratitude, greetings, and congratulations. They are very frequent in VK data, and thesentiment they express is overtly positive, but they are also very formulaic.
2. ildar 07 May 2019
  
  in Public
  
  Wealso defined the “skip” class for excluding the posts that were too noisy, unclear, or not in Russian (e.g.,in Ukrainian). We also made the decision to exclude jokes, poems, song lyrics, and other such contentthat was not generated by the users themselves
3. ildar 07 May 2019
  
  in Public
  
  We prioritized the speed of annotation over detail, opting for a 3-point scale rather than e.g., the 5-point scale in SemEval Twitter datasets (Rosenthal et al., 2017). Thus, the task was to rate the prevailingsentiment in complete posts from VK on a three-point scale (“negative", “neutral”, and “positive”).
4. ildar 07 May 2019
  
  in Public
  
  The annotation was performed by sixnative speakers with backgrounds in linguistics over the course of 5 months. The average annotationspeed was 250-350 posts per hour
5. ildar 07 May 2019
  
  in Public
  
  The datasets from the SentiRuEval 2015 and 2016 competi-tions are the largest resource that has been available to date (Loukachevitch and Rubtsova, 2016). TheSentiRuEval 2016 dataset is comprised by 10,890 tweets from the telecom domain and 12,705 from thebanking domain. The Linis project (Koltsova et al., 2016) reports to have crowdsourced annotation for19,831 blog excerpts, but only 3,327 are currently available on the project website
6. ildar 07 May 2019
  
  in Public
  
  RuSentiLex, the largest sentiment lexicon for Russian3(Loukachevitch and Levchik, 2016), currentlycontains 16,057 words
7. ildar 07 May 2019
  
  in Public
  
  The best results were achieved witha neural network model that made use of word embeddings trained on the VKontakte corpus, which wealso release to enable a fair comparison with our baselines in future work. This model achieved an F1score of 0.728 in a 5-class classification setup.
8. ildar 07 May 2019
  
  in Public
  
  The overall inter-annotatoragreement in terms of Fleiss’ kappa stands at 0.58. In total, 31,185 posts were annotated, 21,268 ofwhich were selected randomly (including 2,967 for the test set). 6,950 posts were pre-selected with anactive learning-style strategy in order to diversify the data.
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/C18-1064
arxiv.org arxiv.org

1710.01113

7
1. ildar 06 May 2019
  
  in Public
  
  On the otherhand, new vehicle concepts withstackable capabilitieshavebeen recently released or are under development, which canbe stacked into a train (through a mechanical and electriccoupling) and/or folded together.
2. ildar 06 May 2019
  
  in Public
  
  It is important to point out thatthe relocationprocess is intrinsically inefficient: as one driver per caris needed, to relocate several cars a large workforce ormany willing customers are necessary
3. ildar 06 May 2019
  
  in Public
  
  One-way car sharing is not without drawbacks for the carsharing operators. With one-way car sharing, cars will followthe natural flows of people in a city, hence accumulating incommercial/business areas in the morning and in residentialareas at night [3]
4. ildar 06 May 2019
  
  in Public
  
  ne-way systems can be also classifiedintofree-floatingorstation-basedaccording to their parkingrestrictions.
5. ildar 06 May 2019
  
  in Public
  
  One-way car sharing, in which customers are not forcedto return the vehicle at the starting point of their journey
6. ildar 06 May 2019
  
  in Public
  
  people do notown a car, they simply rent it from the car sharing operatorwhen they need it (typically for short-range trips), effectivelyimplementing the concept of Mobility-as-a-Service
7. ildar 06 May 2019
  
  in Public
  
  Car sharing can also act asa last-kilometre solution for connecting people with publictransport hubs, hence becoming a feeder to traditional publictransit [2].
Visit annotations in context

Annotators

ildar

URL

arxiv.org/pdf/1710.01113
arxiv.org arxiv.org

Microsoft Word - Lin-Peeta-TRC-Revision-3.docx

5
1. ildar 06 May 2019
  
  in Public
  
  Graph network analysis is conducted on the learned DDGF, which shows the DDGF can capture similarinformation that is embeddedin the SD, DE and DC matrices, and extra hidden heterogeneous pairwisecorrelations between stations
2. ildar 06 May 2019
  
  in Public
  
  Two architecturesof the GCNN-DDGF model,GCNNreg-DDGF and GCNNrec-DDGF are explored.Theirprediction performancesarecomparedwith four GCNN models with pre-defined adjacency matrices and seven benchmark models. The proposed GCNNrec-DDGF out-performs all of thesemodels.
3. ildar 06 May 2019
  
  in Public
  
  Proposinga novel GCNN-DDGFmodel that can automatically learn hidden heteroge-neous pairwise correlations between stationsto predict station-level hourly demand.
4. ildar 06 May 2019
  
  in Public
  
  Aspointed out by many previous studies (Chen et al., 2016; Li et al., 2015; Lin, 2018; Zhou, 2015), it is common for BSSswith fixed stations that some stations are empty with no bikes to check out while othersare full precludingbikes from beingreturnedat those locations.
5. ildar 06 May 2019
  
  in Public
  
  In general, distributed bike-sharing systems(BSSs) can be groupedinto two types, dock-based BSS and non-dock BSS.
Visit annotations in context

Annotators

ildar

URL

arxiv.org/pdf/1712.04997
www.scss.tcd.ie www.scss.tcd.ie

ITSC2018.pdf

4
1. ildar 06 May 2019
  
  in Public
  
  Our proposed approach fully integrates rebalancing, requestassignment and ride sharing, in a fully decentralized manner.
2. ildar 06 May 2019
  
  in Public
  
  Strategiesrange from those that use a short window of known futurerequests (e.g., 5 minutes in [11] and 30 seconds in [4]), basedon historical demand (e.g., [8]) or using prediction techniquesto predict future demand (e.g., [14]).
3. ildar 06 May 2019
  
  in Public
  
  To denote thedifferent areas of the system in order to map the demandto a geographical area, the network is generally divided intoseveral zones [13], blocks [11] or hexagons [14].
4. ildar 06 May 2019
  
  in Public
  
  The relocation of empty vehicles in shared MoD systemshas been widely studied in the literature, and can be dividedinto operator-based approaches [8], [9] (where employeesof the car-sharing service relocate the vehicles), user-basedapproaches [10] (where users are financially incentivized toreturn the vehicles to high-demand areas) and more recentlythose in shared autonomous vehicle (SAV) systems [4], [11],[12] (where driverless vehicles are autonomously relocated).
Visit annotations in context

Annotators

ildar

URL

scss.tcd.ie/Ivana.Dusparic/papers/ITSC2018.pdf
pdfs.semanticscholar.org pdfs.semanticscholar.org

A dynamic clustering method for relocation process in free-floating vehicle sharing systems

10
1. ildar 06 May 2019
  
  in Public
  
  The bike-sharing system simulator proposed by Caggianiand Ottomanelli (2012 and 2013)has been used to represent and model the FFBSS under analysis, pretending that the centroids of each zone coincidewith a hypotheticalbike-sharing station
2. ildar 06 May 2019
  
  in Public
  
  We assume that in this area a free-floating bike-sharing system(FFBSS)is operating. A further assumption is that a typical user iswilling to cover a maximum distance of about 630meters by walk to reach the bicycleclosest to the origin of his/her trip.
3. ildar 06 May 2019
  
  in Public
  
  we apply the suggested methodology to a study area of 1.2 km x 1.2 km of extension. This area is composed of36 square zones, with a side length equal to 0.2 km (grid of 6x6 zones).
4. ildar 06 May 2019
  
  in Public
  
  the zero-vehicle-ti me (ZV T)(Kek et al., 2009). When ZVT occurs, azone(or station, in a station-based system) is without any available vehicle; then, a customer requesting for vehicles at that moment in that zonewill be rejected/unsatisfied.
5. ildar 06 May 2019
  
  in Public
  
  Every zone of this FFVSScould be seen as a station (in a station-based sharing-system), that aggregates/contains(inside its borders) a number of vehicles.
6. ildar 06 May 2019
  
  in Public
  
  some authors have shown how cluster analysis is capable of revealing groups of stations with a similar trend of rental and return activities during the day (Vogelet al., 2011).
7. ildar 06 May 2019
  
  in Public
  
  t is worthy to mention Reiss and Bogenberger (2015), that in order to apply their operator-based strategy to a bike-sharing system, have divided the operating area of the free-floating system into a certain numberof zones, that in a way could be interpreted as stations.
8. ildar 06 May 2019
  
  in Public
  
  all the approaches adopted to relocate the shared fleets can be grouped intotwo categoriesaccording to who actually performs the relocation: user-based and operator-based strategies
9. ildar 06 May 2019
  
  in Public
  
  These imbalances of supply and demandcan be resolved/mitigatedonly with an appropriate reallocation strategy (Reiss and Bogenberger, 2015), namely a transfer of vehicles from zones with high accumulation to areas wheretheshortageis experienced (Boyacıet al., 2015).
10. ildar 06 May 2019
  
  in Public
  
  during the day significant fluctuations in travel demand(due to weather conditions, time of the day and holidays/weekends)can be observed. Sometimes there is avehicle overcrowding incertainzones, and a lack of available vehiclesin others, at the time the users need them(Herrmannet al., 2014)
Visit annotations in context

Annotators

ildar

URL

pdfs.semanticscholar.org/a969/87f4fed77f4d645424f329051f73ccbc1afc.pdf
ceur-ws.org ceur-ws.org

Microsoft Word - paper, post-review.docm

17
1. ildar 05 May 2019
  
  in Public
  
  . The conversion to the binary scale was performed according to the following scheme: {1, 2} → nega-tive, {4, 5} → positive. Reviewsthat have a score of 3 on anaspect were not consid-ered for this aspect when assessing the quality of the algorithm
2. ildar 05 May 2019
  
  in Public
  
  As a result, for thepositive sentiment, 342 terms were found (with thethreshold of 0.2) and 1203 terms for the negative sentiment (with thethreshold of 0.25).
3. ildar 05 May 2019
  
  in Public
  
  In the same way, sentiment terms were obtained. As the initial terms that set the over-all sentiment, the words отличный(excellent) for thepositive class and ужасный(terrible) for thenegative class were chosen. For each newlygenerated term, the co-sine similarity value with the initial term was found and was assigned to the term as theweight.
  
  вес сентимента как дистанция между конкретным словом и одним из начальных: "отличный" или "ужасный"
4. ildar 05 May 2019
  
  in Public
  
  As a result, each of the three aspects has its own list of terms. The number of terms for each aspect is the following: 2550 for Room, 1317 for Location, and 1740 for Service
5. ildar 05 May 2019
  
  in Public
  
  Thus, for each term a list of 10 new terms closest to the original one was found. These lists were combined,with duplicate termsremoved. This process continues and the resulting list again generates a new one according to the same principle. Repeating this procedure for new term lists is an iterative process that generates aspect terms.To remove noise words which appear during term generation,an additional re-striction was used: each newly generated term was stored in the resultinglist of aspect terms only ifthe similarity value with at least three thefive terms in the initial list exceeded0.3 for each aspect. For each term, the cosine similaritywith initial terms is calculated and the maximum is assigned to it as the weight. The weight value will be usedat the sentiment assignment step.
  
  как создают словарь аспектов
6. ildar 05 May 2019
  
  in Public
  
  . For the aspect Room the initial terms номер(room), ванная(bathroom), телевизор(TV), свет(light), кровать(bed) are selected. For the aspect Servicethe initial terms are сервис(service), персонал(staff), администратор(administrator), сотрудник(staff member), консьерж(concierge).For the aspect Location the words местоположение(loca-tion), достопримечательность(attraction), центр(center), транспорт(transport), месторасположение(location) were chosen
7. ildar 05 May 2019
  
  in Public
  
  phy2. The collocations with the adverbочень(very)were processed in the same
8. ildar 05 May 2019
  
  in Public
  
  gs to, it was decided to add the prefix not_to the first adjective, adverb or ve
9. ildar 05 May 2019
  
  in Public
  
  In total, 50 329 reviews were collected for the training corpus
10. ildar 05 May 2019
  
  in Public
  
  For the sentiment identification stage of the algorithm, only three aspects were cho-sen: Room, Location and Service, since they are the most popular ones.
11. ildar 05 May 2019
  
  in Public
  
  The following information was collected from the site: the text of the review, the overall rating of the hotel (on a 5-point scale), an assessment of the hotel's characteris-tics, such as the price-quality ratio, location, room, cleanliness, service, quality of sleep
12. ildar 05 May 2019
  
  in Public
  
  the reviews were collected from the website TripAdviso
13. ildar 05 May 2019
  
  in Public
  
  Another important note is that many methods often benefit fromtaking advantage of more data, i.e. additional reviews, even without annotated terms. This was well demonstrated by top performers in the SemEval-2014 aspect-based sentiment analysis task [Pontiki et al., 2014]
14. ildar 05 May 2019
  
  in Public
  
  Liu [2012] lists four main approaches to aspect extraction:1. Using frequent nouns and noun phrases.2. Using opinion and target relations.3. Supervised learning.4. Topic modeling.
15. ildar 05 May 2019
  
  in Public
  
  State-of-the-art models make use of topic modeling methods, such as Latent Dirichlet Allocation (LDA), and Conditional Random Fields(CRF).
16. ildar 05 May 2019
  
  in Public
  
  Traditional approaches are based on collecting the most frequent words and phrases which are contained in the manually constructed aspect or sentiment lexic
17. ildar 05 May 2019
  
  in Public
  
  The task of aspect-based sentiment analysis [Liu, 2012; Pontiki et al., 2014; Pav-lopoulos, 2014] is usuallysplit into two subtasks: aspect terms extraction and aspect terms polarity estimation,which are concerned separately and often use different techniques.
Visit annotations in context

Annotators

ildar

URL

ceur-ws.org/Vol-2268/paper8.pdf
aclweb.org aclweb.org

SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods

13
1. ildar 05 May 2019
  
  in Public
  
  Agreements for aspect expressions are0.93,0.94,0.93.
2. ildar 05 May 2019
  
  in Public
  
  The Kappa Coefficient is calculatedover aspect-sentiment pairs per each location. Pairwise inter-annotator agreement for aspect categoriesmeasured using Cohen’s Kappa is0.73,0.78and0.70, which is deemed of sufficient quality
3. ildar 05 May 2019
  
  in Public
  
  However this task assumes only theoverallsentiment for eachentity. Moreover, the existing corpora for this task so far has contained only a single target entity per unitof text.
4. ildar 05 May 2019
  
  in Public
  
  Another line of research in this field istargeted(a.k.a. target-dependent) sentiment analysis (Jiang etal., 2011; Vo and Zhang, 2015). Targeted sentiment analysis investigates the classification of opinionpolarities towards certain target entity mentions in given sentences (often a tweet).
5. ildar 05 May 2019
  
  in Public
  
  Aspect-basedsentiment analysis (ABSA) (Jo and Oh, 2011; Pontiki et al., 2015; Pontiki et al., 2016)relates to the task of extracting fine-grained information by identifying the polarity towards differentaspects of an entity in the same unit of text, and recognizing the polarity associated with each aspectseparately
6. ildar 05 May 2019
  
  in Public
  
  argeted aspect-basedsentiment analysis handles extracting the target entities as well as different aspects and their relevantsentiments.
7. ildar 05 May 2019
  
  in Public
  
  Entities in the dataset are locations or neighbourhoods.
8. ildar 05 May 2019
  
  in Public
  
  entences containing one location mention — Single, and sentences con-taining two location mentions — Multi. This is to observe the difficulty of annotating two groups byhuman annotators and by the models
9. ildar 05 May 2019
  
  in Public
  
  In our annotationhowever, we only provided “Positive” and “Negative” sentiment labels.
10. ildar 05 May 2019
  
  in Public
  
  we define the two following special labels. Sen-tences marked with one of the these labels are removed from the dataset
11. ildar 05 May 2019
  
  in Public
  
  We use the BRAT annotation tool (Stenetorp et al., 2012) to simplify the annotation task.
12. ildar 05 May 2019
  
  in Public
  
  Aspectgeneralrefers to a generic opinion about a location, e.g. “I loveCamden Town”
13. ildar 05 May 2019
  
  in Public
  
  pre-defined listof aspects is provided for annotators to choose from. These aspects are:live,safety,price,quiet,dining,nightlife,transit-location,touristy,shopping,green-cultureandmulticultural
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/C16-1146
aclweb.org aclweb.org

UNITOR: Aspect Based Sentiment Analysis with Structured Learning

4
1. ildar 05 May 2019
  
  in Public
  
  In theAspect Cat-egory Polarity(ACP) task the polarity of each ex-pressed category is recognized, e.g. apositivecategory polarity is expressed in sentence 1.
2. ildar 05 May 2019
  
  in Public
  
  In theAspect Category Detection(ACD) task the cate-gory evoked in a sentence is identified, e.g. thefoodcategory in sentence 1).
3. ildar 05 May 2019
  
  in Public
  
  Aspect TermPolarity(ATP) task the polarity evoked for eachaspect is recognized, i.e. apositivepolarity isexpressed with respect tofried rice.
4. ildar 05 May 2019
  
  in Public
  
  TheAspect Term Extraction(ATE) subtask aimsat finding words suggesting the presence of as-pects on which an opinion is expressed, e.g.fried ricein sentence 1
Visit annotations in context

Annotators

ildar

URL

aclweb.org/anthology/S14-2135
pynetwork.readthedocs.io pynetwork.readthedocs.io

Network Analysis Documentation — Network Analysis 1 documentation

1
1. ildar 05 May 2019
  
  in Public
  
  network analyses
Visit annotations in context

Tags

network analyses

Annotators

ildar

URL

pynetwork.readthedocs.io/en/latest/
courses.cs.washington.edu courses.cs.washington.edu

CSE 447/547M: Natural Language Processing

1
1. ildar 04 May 2019
  
  in Public
  
  nlp @course
Visit annotations in context

Tags

@course

nlp

Annotators

ildar

URL

courses.cs.washington.edu/courses/cse447/19wi/
www.cs.cmu.edu www.cs.cmu.edu

Twitter Natural Language Processing -- Noah's ARK

1
1. ildar 04 May 2019
  
  in Public
  
  nlp social media
Visit annotations in context

Tags

social media

nlp

Annotators

ildar

URL

cs.cmu.edu/~ark/TweetNLP/
homes.cs.washington.edu homes.cs.washington.edu

Noah Smith

1
1. ildar 04 May 2019
  
  in Public
  
  nlp @researcher
Visit annotations in context

Tags

nlp

@researcher

Annotators

ildar

URL

homes.cs.washington.edu/~nasmith/
topicmodels.west.uni-koblenz.de topicmodels.west.uni-koblenz.de

Topic Model Tutorial

1
1. ildar 04 May 2019
  
  in Public
  
  nlp topic modelling @animation
Visit annotations in context

Tags

@animation

topic modelling

nlp

Annotators

ildar

URL

topicmodels.west.uni-koblenz.de/
www.andrew.cmu.edu www.andrew.cmu.edu

95-865 Unstructured Data Analytics

2
1. ildar 03 May 2019
  
  in Public
  
  Lecture 7: Hierarchical clustering, topic modeling
  
  nice slides
2. ildar 03 May 2019
  
  in Public
  
  @course nlp deep learning
Visit annotations in context

Tags

@course

deep learning

nlp

Annotators

ildar

URL

andrew.cmu.edu/user/georgech/95-865/
arxiv.org arxiv.org

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

5
1. ildar 03 May 2019
  
  in Public
  
  In practice UMAP uses a force directed graph layout algorithm in low dimen-sional space. A force directed graph layout utilizes of a set of aractive forcesapplied along edges and a set of repulsive forces applied among vertices. Anyforce directed layout algorithm requires a description of both the aractive andrepulsive forces. e algorithm proceeds by iteratively applying aractive andrepulsive forces at each edge or vertex. Convergence is guaranteed by slowlydecreasing the aractive and repulsive forces in a similar fashion to that used insimulated annealing
2. ildar 03 May 2019
  
  in Public
  
  In the rst phase a particular weighted k-neighbour graph is con-structed. In the second phase a low dimensional layout of this graph is computed
3. ildar 03 May 2019
  
  in Public
  
  e theoretical description of the algorithm works in terms of fuzzy simpli-cial sets. Computationally this is only tractable for the one skeleton which canultimately be described as a weighted graph. is means that, from a practi-cal computational perspective, UMAP can ultimately be described in terms of,construction of, and operations on weighted graphs. In particular this situatesUMAP in the class of k-neighbour based graph learning algorithms such as Lapla-cian Eigenmaps, Isomap and t-SNE.
4. ildar 03 May 2019
  
  in Public
  
  At a high level, UMAP uses local manifold approximations and patches to-gether their local fuzzy simplicial set representations to construct a topologicalrepresentation of the high dimensional data. Given some low dimensional rep-resentation of the data, a similar process can be used to construct an equivalenttopological representation. UMAP then optimizes the layout of the data repre-sentation in the low dimensional space, to minimize the cross-entropy betweenthe two topological representations.
5. ildar 03 May 2019
  
  in Public
  
  Dimension reduction algorithms tend to fall into two categories;those that seek to preserve the distance structure within the data and those thatfavor the preservation of local distances over global distance. Algorithms suchas PCA [22], MDS [23], and Sammon mapping [41] fall into the former categorywhile t-SNE [50, 49], Isomap [47], LargeVis [45], Laplacian eigenmaps [5, 6] anddiusion maps [14] all fall into the laer category
Visit annotations in context

Annotators

ildar

URL

arxiv.org/abs/1802.03426
arxiv.org arxiv.org

Untitled document

2
1. ildar 02 May 2019
  
  in Public
  
  5. The dissimilarity matrix of the data should be well represented by theclustering (i.e., by the ultrametric induced by a dendrogram, or by defininga binary metric “in same cluster/in different clusters”).6. Clusters should be stable.7. Clusters should correspond to connected areas in data space with highdensity.8. The areas in data space corresponding to clusters should have certaincharacteristics (such as being convex or linear).9. It should be possible to characterize the clusters using a small number ofvariables.10. Clusters should correspond well to an externally given partitionor valuesof one or more variables that were not used for computing the clustering.11. Features should be approximately independent within clusters.12. All clusters should have roughly the same size.13. The number of clusters should be low.
2. ildar 02 May 2019
  
  in Public
  
  1. Within-cluster dissimilarities should be small.2. Between-cluster dissimilarities should be large.3. Clusters should be fitted well by certain homogeneous probabilitymodelssuch as the Gaussian or a uniform distribution on a convex set, or bylinear, time series or spatial process models.4. Members of a cluster should be well represented by its centroid
Visit annotations in context

Annotators

ildar

URL

arxiv.org/pdf/1502.02555.pdf
Apr 2019
www.cs.cmu.edu www.cs.cmu.edu

cikm577-sahoo.dvi

2
1. ildar 30 Apr 2019
  
  in Public
  
  A natural extension of this idea is to usea Negative Binomial distribution, which is a gamma mixtureof infinite number of Poisson distributions. The probabil-ity density functions of a Negative Binomial distribution isgiven below,P(k)=„k+r−1r−1«pr(1−p)k,(4)wherepandrare parameters of the distributions
2. ildar 30 Apr 2019
  
  in Public
  
  One of the distributions capturesthe rate of the word occurrence when the word occurs be-cause it is topically relevant to the document. The seconddistribution captures the rate of the word occurrence whenthe word occurs without being topically relevant to the doc-ument. This mixture of two probability distributions hasthe probability density function:P(k)=αλk1e−λ1k!+(1−α)λk2e−λ2k!
Visit annotations in context

Annotators

ildar

URL

cs.cmu.edu/~callan/Papers/cikm06-nsahoo.pdf
arxiv.org arxiv.org

1705.07321

8
1. ildar 28 Apr 2019
  
  in Public
  
  rom three dierent perspectives: from a statistically mo-tivated point of view; with a computationally motivated mindset; and in atopologically motivated framework
2. ildar 28 Apr 2019
  
  in Public
  
  Finally HDBSCAN* resolves manyof the diculties in parameter selection by requiring only a small set of intuitiveand fairly robust parameters.
3. ildar 28 Apr 2019
  
  in Public
  
  being a density based approach, DBSCAN only suersfrom the diculty of parameter selection.
4. ildar 28 Apr 2019
  
  in Public
  
  The archetypal clustering algorithm, K-Means,suers from all three of the problems mentioned previously: requiring the selec-tion of the number of clusters; partitioning the data, and hence assigning noiseto clusters; and the implicit assumption that clusters have Gaussian distribu-tions.
5. ildar 28 Apr 2019
  
  in Public
  
  Partitioning, on the other hand,requires that every data point be associated with a particular cluster. In thepresence of noise the partitioning approach can be problematic.
6. ildar 28 Apr 2019
  
  in Public
  
  Methods to determine the number of clusters such as the elbow method andsilhouette method are often subjective and can be hard to apply in practice.
7. ildar 28 Apr 2019
  
  in Public
  
  While clustering has many uses to many people, our particular focus is onclustering for the purpose of exploratory data analysis. By exploratory dataanalysis we mean the process of looking for \interesting patterns" in a data set,primarily with the goal of generating new hypotheses or research questions aboutthe data set in question.
8. ildar 28 Apr 2019
  
  in Public
  
  Clustering is the attempt to group data in a way that meets with human in-tuition. Unfortunately, our intuitive ideas of what makes a `cluster' are poorlydened and highly context sensitive [26].
Visit annotations in context

Annotators

ildar

URL

arxiv.org/pdf/1705.07321
www.demoscope.ru www.demoscope.ru

Социальная сегрегация в городском пространстве

18
1. ildar 27 Apr 2019
  
  in Public
  
  люди, которые придерживаются разных взглядов, чаще всего имеют разное образование и выбирают разные места проживания. В Москве не этнические различия порождают сегрегационные процессы, а социальные. Социальная стратификация влечет за собой имущественную, а она, в свою очередь, - этническую
2. ildar 27 Apr 2019
  
  in Public
  
  Москва разваливается на четыре крупных мировоззренческих кластера, и они могут иметь территориальный признак.
3. ildar 27 Apr 2019
  
  in Public
  
  в наших городах начинает размываться понятие большинства. Чем выше уровень разнообразия, тем меньше вероятность того, что будет большинство, которое формирует доминирующую позицию. Такой город, как Москва, существует как расколотое сообщество.
4. ildar 27 Apr 2019
  
  in Public
  
  Вы прекрасно понимаете, что это один из вызовов демократии, институты демократии могут быть использованы любыми силами. И чем более развита демократия, тем больше возможность быть представленным различиям.
5. ildar 27 Apr 2019
  
  in Public
  
  Я считаю, в современной России нет гетто, есть антигетто.
6. ildar 27 Apr 2019
  
  in Public
  
  Кеннет Бенджамен Кларк, на которого часто ссылается Дидье, в своих работах 60-х годов дает более широкое определение. Он пишет, что гетто являются одновременно парадоксом, конфликтом и дилеммой. Он дает надежду и является безнадежностью, это – церковь и кабак, кооперация и забота в гетто сочетаются с подозрительностью, соперничеством и исключение. Для жителей гетто характерно одновременно сильное стремление к ассимиляции и это отказ от нее, отчуждение и укрытие.
7. ildar 27 Apr 2019
  
  in Public
  
  Гетто – это феномен не географический, связанный с барьерами и границами, хотя это существенно, а социальный. Способ двойной организации сообщества. Например, Лоик Вакан подчеркивает, что гетто функционирует как своего рода этно-расовая тюрьма.
8. ildar 27 Apr 2019
  
  in Public
  
  «чем интенсивнее глобализация, тем активнее формируются гетто». Это важно в контексте нашей дискуссии. Потому что мы употребляем такие понятия, как гетто и сегрегация, не в аналитическом, а в метафорическом смысле.
9. ildar 27 Apr 2019
  
  in Public
  
  Иная ситуация в средневековых городах, когда начинает формироваться сословное общество. Сегрегация возникает не по признаку бедности и богатства, а по признаку корпорации, ремесленной принадлежности и т.д. Если мы возьмем средневековую Москву, это тоже будет сегрегированный город, и это осталось в названиях улиц или кварталов.
10. ildar 27 Apr 2019
  
  in Public
  
  Если говорить о городах античного мира, то, хотя общество было организовано иерархически, оно не было пространственно сегрегировано по признаку «бедности» и «богатства». Сама структура античного дома предполагала, совместную жизнь бедных и богатых, рабов и свободных граждан, слуг и господ.
11. ildar 27 Apr 2019
  
  in Public
  
  Сегрегация – это не только проблема, но и решение, которое находят для себя различные социальные слои для того, чтобы отделиться от других социальных слоев. Здесь же можно наблюдать образование того, что мы называем гетто, образование гетто в городских пространствах.
12. ildar 27 Apr 2019
  
  in Public
  
  Происходит иерархизация районов, различных мест в городе. Система дистанцирования, логика разделения, селективности приводят к тому, что люди показывают некоторое отличие своей группы от другой
13. ildar 27 Apr 2019
  
  in Public
  
  чем важнее становится роль этих потоков, тем большее количество населения ищет места для локализации и концентрации. Здесь вступает некая социальная логика, которая является логикой разделения и дистанцирования, селективной логикой. Мы видим, что социальные группы в городах все более разделяются. Привилегии, которые каждая группа для себя приобрела, позволяют им дистанцироваться от других групп.
14. ildar 27 Apr 2019
  
  in Public
  
  Сейчас происходит секторизация, если хотите. Город организуется больше не по графическому признаку, а скорее как некие островки, более похож на другую структуру, все более варьирующуюся и изменяющуюся. Она состоит из различных мест: где-то больше присутствует коммерция, где-то больше культура, а в некоторых местах вообще запрещено ездить на личном транспорте.
15. ildar 27 Apr 2019
  
  in Public
  
  завершение кольцевой структуры организации городов, завершение организации вокруг какого-то центра
16. ildar 27 Apr 2019
  
  in Public
  
  Прежде всего, нужно сказать, что города вообще изменились. В них больше сегрегации, происходит социальная, городская, этническая трансформация. Сейчас приходит конец той городской модели, которую мы видели в ХIХ веке, которая строится в виде концентрических кругов.
17. ildar 27 Apr 2019
  
  in Public
  
  Можно ли в гетто или сегрегации найти нечто позитивное? Это вопрос, который, мне кажется, чрезвычайно важен и интересен. По крайней мере, в отношении еврейских гетто в социологической традиции существует очень интересная версия. Она в свое время была предложена Ричардом Сэннетом, американским социологом. У него есть блестящая книжка, она называется «Камень и тело». В ней он говорит, что благодаря гетто сохранилась еврейская культура.
18. ildar 27 Apr 2019
  
  in Public
  
  Во-первых, что такое сегрегация в городе – это нормальное или аномальное явление? Может, по мере развития цивилизации мы от этого откажемся? В знаменитой книге Зигмунда Баумана «Глобализация и ее последствия» есть достаточно понятная формула, я ее не расскажу цитатно, но попытаюсь передать: «Гетто – оборотная сторона глобализации». Чем больше усиливаются и чем проще осуществляются глобализационные процессы, тем чаще мы будем с вами видеть эти сегрегации и обособленные кварталы - особенно в больших, глобальных городах.
Visit annotations in context

Annotators

ildar

URL

demoscope.ru/weekly/2014/0597/analit03.php
www.cis.uab.edu www.cis.uab.edu

paper.dvi

5
1. ildar 23 Apr 2019
  
  in Public
  
  clusters similar documents into clusters, and then se-lects features as bursty events from the clusters. Therelated works include TDT [2, 3, 14, 18, 21, 26, 27],text mining [9, 13, 14, 17, 19, 20, 22], and visualiza-tion [7, 11, 24]. However, the main drawback of adapt-ing these techniques for the new hot bursty events de-tection problem is that they require many parametersand it is very difficult to find an effective way to tunethese parameters
2. ildar 23 Apr 2019
  
  in Public
  
  theemphasis of our problem is to identify sets of burstyfeatures, whereas the emphasis of TDT is to find clus-ters of documents.
3. ildar 22 Apr 2019
  
  in Public
  
  TDT is an unsu-pervised learning task (clustering) that finds clustersof documents matching the real events (sets of docu-ments identified by human) by reducing the number ofmissing documents in the clusters found and reducingthe possibility of false alarms.
4. ildar 22 Apr 2019
  
  in Public
  
  It is because that the set of burstyfeatures can be used as a set of features for positiveexamples, and therefore helps partially supervised textclassification [10, 6], which is a text classification tech-nique using positive examples only
5. ildar 22 Apr 2019
  
  in Public
  
  hotbursty events detectionin a text stream, where a textstream is a sequence of chronologically ordered doc-uments, and a hot bursty event is a minimal set ofbursty features that occur together in certain time win-dows with strong support of documents in the textstream
Visit annotations in context

Annotators

ildar

URL

cis.uab.edu/zhang/Spam-mining-papers/Parameter.Free.Bursty.Events.Detection.in.Text.Streams.pdf
kalogeratos.com kalogeratos.com

CBTC-SETN2016.pdf

1
1. ildar 22 Apr 2019
  
  in Public
  
  The rapid increase of a term's frequency of appearance,denes aterm burstin the text stream.
Visit annotations in context

Annotators

ildar

URL

kalogeratos.com/psite/files/MyPapers/CBTC-SETN2016.pdf
corpus.leeds.ac.uk corpus.leeds.ac.uk

2017-thycorpus.pdf

2
1. ildar 22 Apr 2019
  
  in Public
  
  Adam Kilgarriff referred to this as a “whelk” problem [16]. If you have a textabout whelks, no matter how infrequent this word is in the rest of your corpus, it’slikely to be in nearly every sentence in this text.
2. ildar 22 Apr 2019
  
  in Public
  
  some words areless likelyto experience frequency bursts,which puts them in inferior positions in the frequency lists in comparison to thosewhich do.
Visit annotations in context

Annotators

ildar

URL

corpus.leeds.ac.uk/serge/publications/2017-thycorpus.pdf
mimno.infosci.cornell.edu mimno.infosci.cornell.edu

Advanced Topic Modeling

1
1. ildar 22 Apr 2019
  
  in Public
  
  nlp @course
Visit annotations in context

Tags

@course

nlp

Annotators

ildar

URL

mimno.infosci.cornell.edu/info6150/
mimno.infosci.cornell.edu mimno.infosci.cornell.edu

schofield_tacl_2016.pdf

4
1. ildar 22 Apr 2019
  
  in Public
  
  conflating semanti-cally related words into one word type could im-prove model fit by intelligently reducing the spaceof possible models.
2. ildar 22 Apr 2019
  
  in Public
  
  stemmers approximate intuitive wordequivalence classes, so language models based onstemmed corpora inherit that semantic similarity,which may improve interpretability as perceived byhuman evaluators
3. ildar 22 Apr 2019
  
  in Public
  
  stemmers could reduce the effectof small morphological differences on the stabilityof a learned model.
4. ildar 22 Apr 2019
  
  in Public
  
  However, stemmers have the potential to be con-fusing, unreliable, and possibly even harmful in lan-guage models
Visit annotations in context

Annotators

ildar

URL

mimno.infosci.cornell.edu/papers/schofield_tacl_2016.pdf
mimno.infosci.cornell.edu mimno.infosci.cornell.edu

antoniak-stability.pdf

17
1. ildar 22 Apr 2019
  
  in Public
  
  For each corpus, we select a set of 20 relevantquery words from high probability words from anLDA topic model (Blei et al., 2003) trained on thatcorpus with 200 topics. We calculate the cosine sim-ilarity of each query word to the other words in thevocabulary, creating a similarity ranking of all thewords in the vocabulary. We calculate the mean andstandard deviation of the cosine similarities for eachpair of query word and vocabulary word across eachset of 50 models.
2. ildar 22 Apr 2019
  
  in Public
  
  Rankings of most similar words are notreliable, and both ordering and membership in suchlists are liable to change significantly.
3. ildar 22 Apr 2019
  
  in Public
  
  the corpus-centered approach is based ondirect human analysis of nearest neighbors to embed-ding vectors, and the training corpus is not simply anoff-the-shelf convenience but rather the central objectof study
4. ildar 22 Apr 2019
  
  in Public
  
  other researchers take acorpus-centeredapproach and use relationships between em-beddings as direct evidence about the language andculture of the authors of a training corpus (Bolukbasiet al., 2016; Hamilton et al., 2016; Heuser, 2016)
5. ildar 22 Apr 2019
  
  in Public
  
  Although PPMI appears deterministic (due to itspre-computed word-context matrix), we find that thisalgorithm produced results under theFIXEDorderingwhose variability was closest to theBOOTSTRAPset-ting. We attribute this intrinsic variability to the useof token-level subsampling.
6. ildar 22 Apr 2019
  
  in Public
  
  In general, LSA, GloVe, SGNS,and PPMI are not sensitive to document order in thecollections we evaluated
7. ildar 22 Apr 2019
  
  in Public
  
  the membershipof the lists changes substantially between runs of theBOOTSTRAPsetting
8. ildar 22 Apr 2019
  
  in Public
  
  The presence of specific documents has asignificant effect on all four algorithms (lesser forPPMI), consistently increasing the standard devia-tions.
9. ildar 22 Apr 2019
  
  in Public
  
  We observe that theFIXEDandSHUFFLEDsettings for GloVe and LSA producethe least variable cosine similarities, while PPMI pro-duces the most variable cosine similarities for allsettings
10. ildar 22 Apr 2019
  
  in Public
  
  We process each corpus by lowercasing all text, re-moving words that appear fewer than 20 times in thecorpus, and removing all numbers and punctuation.
11. ildar 22 Apr 2019
  
  in Public
  
  GloVe is sensitive to the presence ofspecific documents
12. ildar 22 Apr 2019
  
  in Public
  
  GloVe is not sensitive to document order.
13. ildar 22 Apr 2019
  
  in Public
  
  he pres-ence of specific documents in the corpus can signifi-cantly affect the cosine similarities between embed-ding vectors
14. ildar 22 Apr 2019
  
  in Public
  
  we also removeduplicate documents from each corpus
15. ildar 22 Apr 2019
  
  in Public
  
  NLP research in word embeddings has so far fo-cused on adownstream-centereduse case, wherethe end goal is not the embeddings themselves butperformance on a more complicated task
16. ildar 22 Apr 2019
  
  in Public
  
  f usersdo not account for this variability, their conclusionsare likely to be invalid.Fortunately, we also find thatsimply averaging over multiple bootstrap samplesis sufficient to produce stable, reliable results in allcases tested
17. ildar 22 Apr 2019
  
  in Public
  
  Embedding algo-rithms are much more sensitive than they appear tofactors such as the presence of specific documents,the size of the documents, the size of the corpus, andeven seeds for random number generators
Visit annotations in context

Annotators

ildar

URL

mimno.infosci.cornell.edu/papers/antoniak-stability.pdf

ildar

Annotations: 378

Joined: March 7, 2017

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags