334 Matching Annotations
  1. Last 7 days
    1. define a set of rules to set the correct sentiment score to the opinion word

      sentiment "correction"

    2. the spaCy’s dependency parser is able to identify other dependency words linked to that particular opinion word. This allows you to extract the aspect term

      depedency tree-based aspect term extraction

    3. identify opinion words by cross referencing the opinion lexicon for negative and positive words

      lexicon-based opinion words extraction

    4. If the word fails to meet the threshold for the proximity in the two words in the vector space, the algorithm falls back on using the category of the entire sentence that was classified previously using ML-NB
    5. I first try to assign based on the similarity of the aspect term to the aspect category with word2vec’s n_similarity
    6. tag it with an aspect using a Multi-label Naive Bayes model

      multilabel classification

    7. segment the chunk of text into sentences

      sentence tokenize

    8. I first replace the pronouns in the sentence using a pre-trained neural coreference model;


    9. For example, let’s assume you’re trying to classify a single yelp restaurant review into one of five aspects: food, service, price, ambience, or simply anecdotal/miscellaneous.

      мультилейбл классификация

  2. May 2019
    1. chmoller et al. [21]carry out their analysis using a dataset provided by the car sharing operator,which contains more information than what is generally available to the researchcommunity at large.
    1. There are various types of n-grams and syntactic n-grams ac­cording to types of elements they are built of: lexical units (words, stems, lemmas), POS tags, SR tags (names of syntactic relations), characters, etc.
    2. re­cently we have proposed a concept of syntactic n-grams, i.e., n-grams constructed by following paths in syntactic trees [19,21].
    3. The most widely used features are words and n-grams.
    4. by their nature the features have symbolic values, then they are mapped to numeric values in some manner.
    5. The most common manner to represent objects is the Vector Space Model (VSM) [17]. In this model, the objects are represented as vectors of values of features. The features characterize each object and have numeric values.
    1. Additionally, we defined a subcategory of positive posts that covers frequent speech acts, such asexpressions of gratitude, greetings, and congratulations. They are very frequent in VK data, and thesentiment they express is overtly positive, but they are also very formulaic.
    2. Wealso defined the “skip” class for excluding the posts that were too noisy, unclear, or not in Russian (e.g.,in Ukrainian). We also made the decision to exclude jokes, poems, song lyrics, and other such contentthat was not generated by the users themselves
    3. We prioritized the speed of annotation over detail, opting for a 3-point scale rather than e.g., the 5-point scale in SemEval Twitter datasets (Rosenthal et al., 2017). Thus, the task was to rate the prevailingsentiment in complete posts from VK on a three-point scale (“negative", “neutral”, and “positive”).
    4. The annotation was performed by sixnative speakers with backgrounds in linguistics over the course of 5 months. The average annotationspeed was 250-350 posts per hour
    5. The datasets from the SentiRuEval 2015 and 2016 competi-tions are the largest resource that has been available to date (Loukachevitch and Rubtsova, 2016). TheSentiRuEval 2016 dataset is comprised by 10,890 tweets from the telecom domain and 12,705 from thebanking domain. The Linis project (Koltsova et al., 2016) reports to have crowdsourced annotation for19,831 blog excerpts, but only 3,327 are currently available on the project website
    6. RuSentiLex, the largest sentiment lexicon for Russian3(Loukachevitch and Levchik, 2016), currentlycontains 16,057 words
    7. The best results were achieved witha neural network model that made use of word embeddings trained on the VKontakte corpus, which wealso release to enable a fair comparison with our baselines in future work. This model achieved an F1score of 0.728 in a 5-class classification setup.
    8. The overall inter-annotatoragreement in terms of Fleiss’ kappa stands at 0.58. In total, 31,185 posts were annotated, 21,268 ofwhich were selected randomly (including 2,967 for the test set). 6,950 posts were pre-selected with anactive learning-style strategy in order to diversify the data.
    1. On the otherhand, new vehicle concepts withstackable capabilitieshavebeen recently released or are under development, which canbe stacked into a train (through a mechanical and electriccoupling) and/or folded together.
    2. It is important to point out thatthe relocationprocess is intrinsically inefficient: as one driver per caris needed, to relocate several cars a large workforce ormany willing customers are necessary
    3. One-way car sharing is not without drawbacks for the carsharing operators. With one-way car sharing, cars will followthe natural flows of people in a city, hence accumulating incommercial/business areas in the morning and in residentialareas at night [3]
    4. ne-way systems can be also classifiedintofree-floatingorstation-basedaccording to their parkingrestrictions.
    5. One-way car sharing, in which customers are not forcedto return the vehicle at the starting point of their journey
    6. people do notown a car, they simply rent it from the car sharing operatorwhen they need it (typically for short-range trips), effectivelyimplementing the concept of Mobility-as-a-Service
    7. Car sharing can also act asa last-kilometre solution for connecting people with publictransport hubs, hence becoming a feeder to traditional publictransit [2].
    1. Graph network analysis is conducted on the learned DDGF, which shows the DDGF can capture similarinformation that is embeddedin the SD, DE and DC matrices, and extra hidden heterogeneous pairwisecorrelations between stations
    2. Two architecturesof the GCNN-DDGF model,GCNNreg-DDGF and GCNNrec-DDGF are explored.Theirprediction performancesarecomparedwith four GCNN models with pre-defined adjacency matrices and seven benchmark models. The proposed GCNNrec-DDGF out-performs all of thesemodels.
    3. Proposinga novel GCNN-DDGFmodel that can automatically learn hidden heteroge-neous pairwise correlations between stationsto predict station-level hourly demand.
    4. Aspointed out by many previous studies (Chen et al., 2016; Li et al., 2015; Lin, 2018; Zhou, 2015), it is common for BSSswith fixed stations that some stations are empty with no bikes to check out while othersare full precludingbikes from beingreturnedat those locations.
    5. In general, distributed bike-sharing systems(BSSs) can be groupedinto two types, dock-based BSS and non-dock BSS.
    1. Our proposed approach fully integrates rebalancing, requestassignment and ride sharing, in a fully decentralized manner.
    2. Strategiesrange from those that use a short window of known futurerequests (e.g., 5 minutes in [11] and 30 seconds in [4]), basedon historical demand (e.g., [8]) or using prediction techniquesto predict future demand (e.g., [14]).
    3. To denote thedifferent areas of the system in order to map the demandto a geographical area, the network is generally divided intoseveral zones [13], blocks [11] or hexagons [14].
    4. The relocation of empty vehicles in shared MoD systemshas been widely studied in the literature, and can be dividedinto operator-based approaches [8], [9] (where employeesof the car-sharing service relocate the vehicles), user-basedapproaches [10] (where users are financially incentivized toreturn the vehicles to high-demand areas) and more recentlythose in shared autonomous vehicle (SAV) systems [4], [11],[12] (where driverless vehicles are autonomously relocated).
    1. The bike-sharing system simulator proposed by Caggianiand Ottomanelli (2012 and 2013)has been used to represent and model the FFBSS under analysis, pretending that the centroids of each zone coincidewith a hypotheticalbike-sharing station
    2. We assume that in this area a free-floating bike-sharing system(FFBSS)is operating. A further assumption is that a typical user iswilling to cover a maximum distance of about 630meters by walk to reach the bicycleclosest to the origin of his/her trip.
    3. we apply the suggested methodology to a study area of 1.2 km x 1.2 km of extension. This area is composed of36 square zones, with a side length equal to 0.2 km (grid of 6x6 zones).
    4. the zero-vehicle-ti me (ZV T)(Kek et al., 2009). When ZVT occurs, azone(or station, in a station-based system) is without any available vehicle; then, a customer requesting for vehicles at that moment in that zonewill be rejected/unsatisfied.
    5. Every zone of this FFVSScould be seen as a station (in a station-based sharing-system), that aggregates/contains(inside its borders) a number of vehicles.
    6. some authors have shown how cluster analysis is capable of revealing groups of stations with a similar trend of rental and return activities during the day (Vogelet al., 2011).
    7. t is worthy to mention Reiss and Bogenberger (2015), that in order to apply their operator-based strategy to a bike-sharing system, have divided the operating area of the free-floating system into a certain numberof zones, that in a way could be interpreted as stations.
    8. all the approaches adopted to relocate the shared fleets can be grouped intotwo categoriesaccording to who actually performs the relocation: user-based and operator-based strategies
    9. These imbalances of supply and demandcan be resolved/mitigatedonly with an appropriate reallocation strategy (Reiss and Bogenberger, 2015), namely a transfer of vehicles from zones with high accumulation to areas wheretheshortageis experienced (Boyacıet al., 2015).
    10. during the day significant fluctuations in travel demand(due to weather conditions, time of the day and holidays/weekends)can be observed. Sometimes there is avehicle overcrowding incertainzones, and a lack of available vehiclesin others, at the time the users need them(Herrmannet al., 2014)
    1. . The conversion to the binary scale was performed according to the following scheme: {1, 2} → nega-tive, {4, 5} → positive. Reviewsthat have a score of 3 on anaspect were not consid-ered for this aspect when assessing the quality of the algorithm
    2. As a result, for thepositive sentiment, 342 terms were found (with thethreshold of 0.2) and 1203 terms for the negative sentiment (with thethreshold of 0.25).
    3. In the same way, sentiment terms were obtained. As the initial terms that set the over-all sentiment, the words отличный(excellent) for thepositive class and ужасный(terrible) for thenegative class were chosen. For each newlygenerated term, the co-sine similarity value with the initial term was found and was assigned to the term as theweight.

      вес сентимента как дистанция между конкретным словом и одним из начальных: "отличный" или "ужасный"

    4. As a result, each of the three aspects has its own list of terms. The number of terms for each aspect is the following: 2550 for Room, 1317 for Location, and 1740 for Service
    5. Thus, for each term a list of 10 new terms closest to the original one was found. These lists were combined,with duplicate termsremoved. This process continues and the resulting list again generates a new one according to the same principle. Repeating this procedure for new term lists is an iterative process that generates aspect terms.To remove noise words which appear during term generation,an additional re-striction was used: each newly generated term was stored in the resultinglist of aspect terms only ifthe similarity value with at least three thefive terms in the initial list exceeded0.3 for each aspect. For each term, the cosine similaritywith initial terms is calculated and the maximum is assigned to it as the weight. The weight value will be usedat the sentiment assignment step.

      как создают словарь аспектов

    6. . For the aspect Room the initial terms номер(room), ванная(bathroom), телевизор(TV), свет(light), кровать(bed) are selected. For the aspect Servicethe initial terms are сервис(service), персонал(staff), администратор(administrator), сотрудник(staff member), консьерж(concierge).For the aspect Location the words местоположение(loca-tion), достопримечательность(attraction), центр(center), транспорт(transport), месторасположение(location) were chosen
    7. phy2. The collocations with the adverbочень(very)were processed in the same
    8. gs to, it was decided to add the prefix not_to the first adjective, adverb or ve
    9. In total, 50 329 reviews were collected for the training corpus
    10. For the sentiment identification stage of the algorithm, only three aspects were cho-sen: Room, Location and Service, since they are the most popular ones.
    11. The following information was collected from the site: the text of the review, the overall rating of the hotel (on a 5-point scale), an assessment of the hotel's characteris-tics, such as the price-quality ratio, location, room, cleanliness, service, quality of sleep
    12. the reviews were collected from the website TripAdviso
    13. Another important note is that many methods often benefit fromtaking advantage of more data, i.e. additional reviews, even without annotated terms. This was well demonstrated by top performers in the SemEval-2014 aspect-based sentiment analysis task [Pontiki et al., 2014]
    14. Liu [2012] lists four main approaches to aspect extraction:1. Using frequent nouns and noun phrases.2. Using opinion and target relations.3. Supervised learning.4. Topic modeling.
    15. State-of-the-art models make use of topic modeling methods, such as Latent Dirichlet Allocation (LDA), and Conditional Random Fields(CRF).
    16. Traditional approaches are based on collecting the most frequent words and phrases which are contained in the manually constructed aspect or sentiment lexic
    17. The task of aspect-based sentiment analysis [Liu, 2012; Pontiki et al., 2014; Pav-lopoulos, 2014] is usuallysplit into two subtasks: aspect terms extraction and aspect terms polarity estimation,which are concerned separately and often use different techniques.
    1. Agreements for aspect expressions are0.93,0.94,0.93.
    2. The Kappa Coefficient is calculatedover aspect-sentiment pairs per each location. Pairwise inter-annotator agreement for aspect categoriesmeasured using Cohen’s Kappa is0.73,0.78and0.70, which is deemed of sufficient quality
    3. However this task assumes only theoverallsentiment for eachentity. Moreover, the existing corpora for this task so far has contained only a single target entity per unitof text.
    4. Another line of research in this field istargeted(a.k.a. target-dependent) sentiment analysis (Jiang etal., 2011; Vo and Zhang, 2015). Targeted sentiment analysis investigates the classification of opinionpolarities towards certain target entity mentions in given sentences (often a tweet).
    5. Aspect-basedsentiment analysis (ABSA) (Jo and Oh, 2011; Pontiki et al., 2015; Pontiki et al., 2016)relates to the task of extracting fine-grained information by identifying the polarity towards differentaspects of an entity in the same unit of text, and recognizing the polarity associated with each aspectseparately
    6. argeted aspect-basedsentiment analysis handles extracting the target entities as well as different aspects and their relevantsentiments.
    7. Entities in the dataset are locations or neighbourhoods.
    8. entences containing one location mention — Single, and sentences con-taining two location mentions — Multi. This is to observe the difficulty of annotating two groups byhuman annotators and by the models
    9. In our annotationhowever, we only provided “Positive” and “Negative” sentiment labels.
    10. we define the two following special labels. Sen-tences marked with one of the these labels are removed from the dataset
    11. We use the BRAT annotation tool (Stenetorp et al., 2012) to simplify the annotation task.
    12. Aspectgeneralrefers to a generic opinion about a location, e.g. “I loveCamden Town”
    13. pre-defined listof aspects is provided for annotators to choose from. These aspects are:live,safety,price,quiet,dining,nightlife,transit-location,touristy,shopping,green-cultureandmulticultural
    1. In theAspect Cat-egory Polarity(ACP) task the polarity of each ex-pressed category is recognized, e.g. apositivecategory polarity is expressed in sentence 1.
    2. In theAspect Category Detection(ACD) task the cate-gory evoked in a sentence is identified, e.g. thefoodcategory in sentence 1).
    3. Aspect TermPolarity(ATP) task the polarity evoked for eachaspect is recognized, i.e. apositivepolarity isexpressed with respect tofried rice.
    4. TheAspect Term Extraction(ATE) subtask aimsat finding words suggesting the presence of as-pects on which an opinion is expressed, e.g.fried ricein sentence 1
    1. In practice UMAP uses a force directed graph layout algorithm in low dimen-sional space. A force directed graph layout utilizes of a set of aŠractive forcesapplied along edges and a set of repulsive forces applied among vertices. Anyforce directed layout algorithm requires a description of both the aŠractive andrepulsive forces. Œe algorithm proceeds by iteratively applying aŠractive andrepulsive forces at each edge or vertex. Convergence is guaranteed by slowlydecreasing the aŠractive and repulsive forces in a similar fashion to that used insimulated annealing
    2. In the €rst phase a particular weighted k-neighbour graph is con-structed. In the second phase a low dimensional layout of this graph is computed
    3. Œe theoretical description of the algorithm works in terms of fuzzy simpli-cial sets. Computationally this is only tractable for the one skeleton which canultimately be described as a weighted graph. Œis means that, from a practi-cal computational perspective, UMAP can ultimately be described in terms of,construction of, and operations on weighted graphs. In particular this situatesUMAP in the class of k-neighbour based graph learning algorithms such as Lapla-cian Eigenmaps, Isomap and t-SNE.
    4. At a high level, UMAP uses local manifold approximations and patches to-gether their local fuzzy simplicial set representations to construct a topologicalrepresentation of the high dimensional data. Given some low dimensional rep-resentation of the data, a similar process can be used to construct an equivalenttopological representation. UMAP then optimizes the layout of the data repre-sentation in the low dimensional space, to minimize the cross-entropy betweenthe two topological representations.
    5. Dimension reduction algorithms tend to fall into two categories;those that seek to preserve the distance structure within the data and those thatfavor the preservation of local distances over global distance. Algorithms suchas PCA [22], MDS [23], and Sammon mapping [41] fall into the former categorywhile t-SNE [50, 49], Isomap [47], LargeVis [45], Laplacian eigenmaps [5, 6] anddi‚usion maps [14] all fall into the laŠer category
    1. 5. The dissimilarity matrix of the data should be well represented by theclustering (i.e., by the ultrametric induced by a dendrogram, or by defininga binary metric “in same cluster/in different clusters”).6. Clusters should be stable.7. Clusters should correspond to connected areas in data space with highdensity.8. The areas in data space corresponding to clusters should have certaincharacteristics (such as being convex or linear).9. It should be possible to characterize the clusters using a small number ofvariables.10. Clusters should correspond well to an externally given partitionor valuesof one or more variables that were not used for computing the clustering.11. Features should be approximately independent within clusters.12. All clusters should have roughly the same size.13. The number of clusters should be low.
    2. 1. Within-cluster dissimilarities should be small.2. Between-cluster dissimilarities should be large.3. Clusters should be fitted well by certain homogeneous probabilitymodelssuch as the Gaussian or a uniform distribution on a convex set, or bylinear, time series or spatial process models.4. Members of a cluster should be well represented by its centroid
  3. Apr 2019
    1. A natural extension of this idea is to usea Negative Binomial distribution, which is a gamma mixtureof infinite number of Poisson distributions. The probabil-ity density functions of a Negative Binomial distribution isgiven below,P(k)=„k+r−1r−1«pr(1−p)k,(4)wherepandrare parameters of the distributions
    2. One of the distributions capturesthe rate of the word occurrence when the word occurs be-cause it is topically relevant to the document. The seconddistribution captures the rate of the word occurrence whenthe word occurs without being topically relevant to the doc-ument. This mixture of two probability distributions hasthe probability density function:P(k)=αλk1e−λ1k!+(1−α)λk2e−λ2k!
    1. rom three di erent perspectives: from a statistically mo-tivated point of view; with a computationally motivated mindset; and in atopologically motivated framework
    2. Finally HDBSCAN* resolves manyof the diculties in parameter selection by requiring only a small set of intuitiveand fairly robust parameters.
    3. being a density based approach, DBSCAN only su ersfrom the diculty of parameter selection.
    4. The archetypal clustering algorithm, K-Means,su ers from all three of the problems mentioned previously: requiring the selec-tion of the number of clusters; partitioning the data, and hence assigning noiseto clusters; and the implicit assumption that clusters have Gaussian distribu-tions.
    5. Partitioning, on the other hand,requires that every data point be associated with a particular cluster. In thepresence of noise the partitioning approach can be problematic.
    6. Methods to determine the number of clusters such as the elbow method andsilhouette method are often subjective and can be hard to apply in practice.
    7. While clustering has many uses to many people, our particular focus is onclustering for the purpose of exploratory data analysis. By exploratory dataanalysis we mean the process of looking for \interesting patterns" in a data set,primarily with the goal of generating new hypotheses or research questions aboutthe data set in question.
    8. Clustering is the attempt to group data in a way that meets with human in-tuition. Unfortunately, our intuitive ideas of what makes a `cluster' are poorlyde ned and highly context sensitive [26].
    1. люди, которые придерживаются разных взглядов, чаще всего имеют разное образование и выбирают разные места проживания. В Москве не этнические различия порождают сегрегационные процессы, а социальные. Социальная стратификация влечет за собой имущественную, а она, в свою очередь, - этническую
    2. Москва разваливается на четыре крупных мировоззренческих кластера, и они могут иметь территориальный признак.
    3. в наших городах начинает размываться понятие большинства. Чем выше уровень разнообразия, тем меньше вероятность того, что будет большинство, которое формирует доминирующую позицию. Такой город, как Москва, существует как расколотое сообщество.
    4. Вы прекрасно понимаете, что это один из вызовов демократии, институты демократии могут быть использованы любыми силами. И чем более развита демократия, тем больше возможность быть представленным различиям.
    5. Я считаю, в современной России нет гетто, есть антигетто.
    6. Кеннет Бенджамен Кларк, на которого часто ссылается Дидье, в своих работах 60-х годов дает более широкое определение. Он пишет, что гетто являются одновременно парадоксом, конфликтом и дилеммой. Он дает надежду и является безнадежностью, это – церковь и кабак, кооперация и забота в гетто сочетаются с подозрительностью, соперничеством и исключение. Для жителей гетто характерно одновременно сильное стремление к ассимиляции и это отказ от нее, отчуждение и укрытие.
    7. Гетто – это феномен не географический, связанный с барьерами и границами, хотя это существенно, а социальный. Способ двойной организации сообщества. Например, Лоик Вакан подчеркивает, что гетто функционирует как своего рода этно-расовая тюрьма.
    8. «чем интенсивнее глобализация, тем активнее формируются гетто». Это важно в контексте нашей дискуссии. Потому что мы употребляем такие понятия, как гетто и сегрегация, не в аналитическом, а в метафорическом смысле.
    9. Иная ситуация в средневековых городах, когда начинает формироваться сословное общество. Сегрегация возникает не по признаку бедности и богатства, а по признаку корпорации, ремесленной принадлежности и т.д. Если мы возьмем средневековую Москву, это тоже будет сегрегированный город, и это осталось в названиях улиц или кварталов.
    10. Если говорить о городах античного мира, то, хотя общество было организовано иерархически, оно не было пространственно сегрегировано по признаку «бедности» и «богатства». Сама структура античного дома предполагала, совместную жизнь бедных и богатых, рабов и свободных граждан, слуг и господ.
    11. Сегрегация – это не только проблема, но и решение, которое находят для себя различные социальные слои для того, чтобы отделиться от других социальных слоев. Здесь же можно наблюдать образование того, что мы называем гетто, образование гетто в городских пространствах.
    12. Происходит иерархизация районов, различных мест в городе. Система дистанцирования, логика разделения, селективности приводят к тому, что люди показывают некоторое отличие своей группы от другой
    13. чем важнее становится роль этих потоков, тем большее количество населения ищет места для локализации и концентрации. Здесь вступает некая социальная логика, которая является логикой разделения и дистанцирования, селективной логикой. Мы видим, что социальные группы в городах все более разделяются. Привилегии, которые каждая группа для себя приобрела, позволяют им дистанцироваться от других групп.
    14. Сейчас происходит секторизация, если хотите. Город организуется больше не по графическому признаку, а скорее как некие островки, более похож на другую структуру, все более варьирующуюся и изменяющуюся. Она состоит из различных мест: где-то больше присутствует коммерция, где-то больше культура, а в некоторых местах вообще запрещено ездить на личном транспорте.
    15. завершение кольцевой структуры организации городов, завершение организации вокруг какого-то центра
    16. Прежде всего, нужно сказать, что города вообще изменились. В них больше сегрегации, происходит социальная, городская, этническая трансформация. Сейчас приходит конец той городской модели, которую мы видели в ХIХ веке, которая строится в виде концентрических кругов.
    17. Можно ли в гетто или сегрегации найти нечто позитивное? Это вопрос, который, мне кажется, чрезвычайно важен и интересен. По крайней мере, в отношении еврейских гетто в социологической традиции существует очень интересная версия. Она в свое время была предложена Ричардом Сэннетом, американским социологом. У него есть блестящая книжка, она называется «Камень и тело». В ней он говорит, что благодаря гетто сохранилась еврейская культура.
    18. Во-первых, что такое сегрегация в городе – это нормальное или аномальное явление? Может, по мере развития цивилизации мы от этого откажемся? В знаменитой книге Зигмунда Баумана «Глобализация и ее последствия» есть достаточно понятная формула, я ее не расскажу цитатно, но попытаюсь передать: «Гетто – оборотная сторона глобализации». Чем больше усиливаются и чем проще осуществляются глобализационные процессы, тем чаще мы будем с вами видеть эти сегрегации и обособленные кварталы - особенно в больших, глобальных городах.
    1. clusters similar documents into clusters, and then se-lects features as bursty events from the clusters. Therelated works include TDT [2, 3, 14, 18, 21, 26, 27],text mining [9, 13, 14, 17, 19, 20, 22], and visualiza-tion [7, 11, 24]. However, the main drawback of adapt-ing these techniques for the new hot bursty events de-tection problem is that they require many parametersand it is very difficult to find an effective way to tunethese parameters
    2. theemphasis of our problem is to identify sets of burstyfeatures, whereas the emphasis of TDT is to find clus-ters of documents.
    3. TDT is an unsu-pervised learning task (clustering) that finds clustersof documents matching the real events (sets of docu-ments identified by human) by reducing the number ofmissing documents in the clusters found and reducingthe possibility of false alarms.
    4. It is because that the set of burstyfeatures can be used as a set of features for positiveexamples, and therefore helps partially supervised textclassification [10, 6], which is a text classification tech-nique using positive examples only
    5. hotbursty events detectionin a text stream, where a textstream is a sequence of chronologically ordered doc-uments, and a hot bursty event is a minimal set ofbursty features that occur together in certain time win-dows with strong support of documents in the textstream
    1. The rapid increase of a term's frequency of appearance,de nes aterm burstin the text stream.
    1. Adam Kilgarriff referred to this as a “whelk” problem [16]. If you have a textabout whelks, no matter how infrequent this word is in the rest of your corpus, it’slikely to be in nearly every sentence in this text.
    2. some words areless likelyto experience frequency bursts,which puts them in inferior positions in the frequency lists in comparison to thosewhich do.
    1. conflating semanti-cally related words into one word type could im-prove model fit by intelligently reducing the spaceof possible models.
    2. stemmers approximate intuitive wordequivalence classes, so language models based onstemmed corpora inherit that semantic similarity,which may improve interpretability as perceived byhuman evaluators
    3. stemmers could reduce the effectof small morphological differences on the stabilityof a learned model.
    4. However, stemmers have the potential to be con-fusing, unreliable, and possibly even harmful in lan-guage models
    1. For each corpus, we select a set of 20 relevantquery words from high probability words from anLDA topic model (Blei et al., 2003) trained on thatcorpus with 200 topics. We calculate the cosine sim-ilarity of each query word to the other words in thevocabulary, creating a similarity ranking of all thewords in the vocabulary. We calculate the mean andstandard deviation of the cosine similarities for eachpair of query word and vocabulary word across eachset of 50 models.
    2. Rankings of most similar words are notreliable, and both ordering and membership in suchlists are liable to change significantly.
    3. the corpus-centered approach is based ondirect human analysis of nearest neighbors to embed-ding vectors, and the training corpus is not simply anoff-the-shelf convenience but rather the central objectof study
    4. other researchers take acorpus-centeredapproach and use relationships between em-beddings as direct evidence about the language andculture of the authors of a training corpus (Bolukbasiet al., 2016; Hamilton et al., 2016; Heuser, 2016)
    5. Although PPMI appears deterministic (due to itspre-computed word-context matrix), we find that thisalgorithm produced results under theFIXEDorderingwhose variability was closest to theBOOTSTRAPset-ting. We attribute this intrinsic variability to the useof token-level subsampling.
    6. In general, LSA, GloVe, SGNS,and PPMI are not sensitive to document order in thecollections we evaluated
    7. the membershipof the lists changes substantially between runs of theBOOTSTRAPsetting
    8. The presence of specific documents has asignificant effect on all four algorithms (lesser forPPMI), consistently increasing the standard devia-tions.
    9. We observe that theFIXEDandSHUFFLEDsettings for GloVe and LSA producethe least variable cosine similarities, while PPMI pro-duces the most variable cosine similarities for allsettings
    10. We process each corpus by lowercasing all text, re-moving words that appear fewer than 20 times in thecorpus, and removing all numbers and punctuation.
    11. GloVe is sensitive to the presence ofspecific documents
    12. GloVe is not sensitive to document order.
    13. he pres-ence of specific documents in the corpus can signifi-cantly affect the cosine similarities between embed-ding vectors
    14. we also removeduplicate documents from each corpus
    15. NLP research in word embeddings has so far fo-cused on adownstream-centereduse case, wherethe end goal is not the embeddings themselves butperformance on a more complicated task
    16. f usersdo not account for this variability, their conclusionsare likely to be invalid.Fortunately, we also find thatsimply averaging over multiple bootstrap samplesis sufficient to produce stable, reliable results in allcases tested
    17. Embedding algo-rithms are much more sensitive than they appear tofactors such as the presence of specific documents,the size of the documents, the size of the corpus, andeven seeds for random number generators
    1. рецензируемаямонографиявключаетвсебя 47 глав, посвященныхразличнымотраслямономастическойнаукиинаписанных 43 ав-торамииз 13 стран. РуководствоавторскимколлективомвзяланасебяКэролХоу, профессоруниверситетаГлазго, бывшийпрезидентМеждународногосо-ветапоономастическимнаукам
    1. прецедентный текст, прецедент-ное высказывание, прецедентное имя, преце-дентную ситуацию
    2. По мнению исследователя, к прецедентымотносятся «тексты, (1) значимые для той илииной личности в познавательном и эмоцио-нальном отношениях, (2) имеющие сверхлич-ностный характер, то есть хорошо известныеи окружению данной личности, включая ипредшественников и современников, и, нако-нец, такие, (3) обращение к которым возоб-новляется неоднократно в дискурсе даннойязыковой личности» [1, с. 216].
    3. Апеллятивизация, то есть переход именсобственных (онимов) в имена нарицатель-ные (апеллятивы) распространена в русскомязыке достаточно широко.
    4. При участии апеллятивизированного они-ма Отелло образовалась синонимическаяпара ревнивец – Отелло (Отелло – геройодноименной трагедии В. Шекспира (1564–1616), из ревности убивший свою жену).
    5. Апеллятивизированный оним Печкинпополнил синонимический ряд с доминантойпочтальон (Печкин – почтальон, персонаждетских повестей современного российскогописателя Э. Успенского и снятых по ним муль-типликационных фильмов)
    6. Рассматривая причины преимуществен-ного использования апеллятивизированных они-мов Шерлок Холмс (Холмс) и Пинкертон посравнению с апеллятивизированными онимамиМегрэ и Пуаро, мы пришли к выводу, что дан-ный факт не связан со степенью известноститого или иного литературного героя. Так, рас-сказы и повести о Шерлоке Холмсе и снятыепо ним фильмы пользуются большой популяр-ностью, а романы о Пинкертоне не переизда-вались много лет. При этом по частоте упот-реблений единицы Шерлок Холмс (Холмс)иПинкертон различаются незначительно
    7. Прецедентным именемназывают «индивидуальное имя, связанноеили 1) с широко известным текстом, относя-щимся, как правило, к числу прецедентных(Анна Каренина, Обломов), или 2) с ситуа-цией, широко известной носителям языка ивыступающей как прецедентная (Иван Су-санин); в состав прецедентных имен входяттакже 3) имена-символы, указывающие на не-которую эталонную совокупность определен-ных качеств (Наполеон, Сальери)» [2]
    8. Апеллятивизированный оним Митрофа-нушка (Митрофан) пополнил синонимичес-кий ряд с доминантой невежда (Митрофа-нушка – невежественный и не желавшийучиться герой комедии Д.И. Фонвизина (1745–1792) «Недоросль»)
    9. про-номинации– «замены нарицательного име-ни собственным (или наоборот), например:Отелло вместо ревнивец»
    10. Таким образом, имена литературных ге-роев, подвергшиеся апеллятивизации, входятв синонимический ряд и образуют синоними-ческие пары с соответствующими апелляти-вами. В подавляющем большинстве случа-ев апеллятивизированная единица выступа-ет по отношению к доминанте как стилисти-ческий синоним.
    11. Апеллятивизированный оним Плюшкинпополнил синонимический ряд с доминантойскупец (Плюшкин – отличавшийся необычай-ной скупостью помещик, персонаж поэмыН.В. Гоголя (1809–1852) «Мертвые души»).
    1. Distant reading o ersnew ways to challenge assumptions about genre, narrative and other aspects ofliterature, by facilitating the analysis of large-scale collections of literary works.Numerous approaches have been proposed and tested for this purpose, includ-ing those based on statistical topic models [10], character pro ling [6], characterfrequency analysis [5, 22], and sentiment analysis [4].
    1. DeepWalk was introduced by Perozziet al.in 2014 [28]. The authors brought the idea of us-ing recent work in word embedding. They emphasize interesting similarities between NLP andSNA (as seen in II.1). The main idea is to givesentencesof nodes (instead of words) as inputfor word embedding algorithms. They used random walk to produce suchsentencesout of anunweighted graph (the overall architecture of Deepwalk can be seen in figure III.5).
    2. Negative sampling (NS) is a simplified version of Noise-contrastive estimation (NCE) [35]. Theyboth are sampling approaches : instead of calculating a cross product for each word in the vo-cabulary, they just use samples of the vocabulary set. Unlike the hierarchical softmax, samplingapproaches do not exhibit a softmax layer. Sampling approaches are only useful during the trainingtime. A full softmax must be computed during the evaluation (that is not a problem since we donot use the model after the training, we just get the feature vectors)
    3. According to Mikolovet al.it seems that the non linear hidden layer brings too much com-plexity [26]. In this paper, the author propose two shallow neural networks without hidden layer,Continuous Bag of Words (CBOW) and Skip-Gram model.
    4. Asstated before, Social Network Analysis often deals with millions of nodes, therefore the size of theadjacency matrix is the square of number of nodes. Not mentioning obvious performance issues,this embedding would be hardly exploitable because of the curse of dimensionality
    5. The simplest graph embedding we can consider is the adjacency matrix of the network
    6. Producing vector representations of nodes builds a bridge between social network analysis,data analytics and statistics. The vectors can therefore be used by machine learning algorithmsthat takes vectors as input for prediction, distance computation or clustering.
    7. Node role identificationaims to infer from a social graph the role the node is playing into thenetwork.
    8. Community detectionaims to identify groups of nodes highly interconnected
    9. Link predictionaims to predict the connections that will appear between the actors of a socialnetwork.
    10. Node classificationaims to retrieved individual data of an actor (age, gender, interestsetc.)
    11. Information diffusion studyingaims to understand how an information is diffused through thenetwork
    12. Network modelingaims to simulate the behavior of the network with a simple model.
    13. This idea is summed up by Degenne and Forsé [1]: “Instead of thinking realityin terms of relations between actors, lots of those who analyze empirical data limit themselvesinto thinking it in categories (for example: young people, women, executives, developing countries,etc.). These categories are built by aggregating people with similar features and,a priori, relevantfor the current issue.
    1. Downsides of NMF Can only be applied to non-negative data Interpretability is hit or miss Non-convex optimization, requires initialization Not orthogonal
    2. Why NMF? Meaningful signs Positive weights No “cancellation” like in PCA Can learn over-complete representation
    1. We extend the fact thatNMF is similar to pLSI and LDA generative models and modelthe outliers using the`1;2-norm. This particular formulation ofNMF is non-standard, and requires careful design of optimizationmethods to solve the problem. We solve the resulting optimiza-tion problem using block coordinate descent technique.
    2. One advantage of matrix factorizationmethods is that they decompose the term-document structureof the underlying corpus into a set of semantic term clusters anddocument clusters. The semantic nature of this decompositionprovides the context in which a document may be interpreted foroutlier analysis.
    3. there are surprisingly few methods which arespecifically focusedon this domain, even though many genericmethods such as distance-based methods can be easily adapted tothis domain [13,20], and are often used for text outlier analysis
    1. Thenotionofaneventdiffersfromabroadercategoryofeventsbothinspatial/temporallocalizationandinspecificity.Forexample,theeruptionofMountPinatuboonJune15th,1991isconsidertobeanevent,whereasvolcaniceruptioningeneralisconsideredtobeaclassofevents
    2. Eventsmightbeunexpected,suchastheerup-tionofavolcano,orexpected,suchasapoliticalelection
    3. Duringthefirstportionofthisstudy,thenotionofa“topic”wasmodifiedandsharp-enedtobean“event”,meaningsomeuniquethingthathap-pensatsomepointintime
    4. Thetrackingtaskisdefinedtobethetaskofassociatingincomingstorieswitheventsknowntothesystem.Aneventisdefined(“known”)byitsassociationwithstoriesthatdiscusstheevent.Thuseachtargeteventisde-finedbyalistofstoriesthatdiscussit
    5. ThisstudycorpusspanstheperiodfromJuly1,1994toJune30,1995andincludesnearly16,000stories,withabouthalftakenfromReutersnewswireandhalffromCNNbroadcastnewstranscripts.
    1. Relational Topic Models (RTM), is another extension, RTM is a hierarchicalmodel of networks and per-node attribute data. First, each document was createdfrom topics in LDA. Then, modelling the connections between documents and con-sidered as binary variables, one for each pair from documents. These are distributedbased on a distribution that depends on topics used to generate each of the constituentdocuments. So in this way, the content of the documents are statistically linked to thelink structure between them and we can say that this model can be used to summarizea network of documents [69]
    2. MedLDA, proposed the maximum entropy discrimination latent Dirichlet allo-cation (MedLDA) model, which incorporates the mechanism behind the hierarchi-cal Bayesian models (such as, LDA) with the max margin learning (such as SVMs)according to a unified restricted optimization framework.
    3. LLDA is a supervised algorithm that makes topics applying the Labels assigned man-ually. Therefore, LLDA can obtain meaningful topics, with words that map well tothe labels applied. As a disadvantage, Labeled LDA has limitation to support latentsubtopics within a determined label or any global topics. For overcome this problem,proposed partially labeled LDA (PLDA)
    4. DTM, Dynamic Topic Model (DTM) is introduced by Blei and Laerty as an ex-tension of LDA that this model can obtain evolution of topics over time in a sequen-tially arranged corpus of documents and exhibits evolution of word-topic distributionwhich causes it easy to vision the topic trend [73]. As an advantage, DTM is veryimpressible for extracting topics from collections that change slowly over a period oftime.
    5. Author-Topic model [75], is a popular and simple probabilistic model in topicmodeling for finding relationships among authors, topics, words and documents. Thismodel provides a distribution of dierent topics for each author and also a distributionof words for each topic. For evaluation, the authors used 1700 papers of NIPS con-ference and also 160,000 CiteSeer abstracts of CiteSeer dataset. To estimate the topicand author distributions applied Gibbs sampling. According to their result, showedthis approach can provide a significantly predictive for interests of authors in per-plexity measure.
    6. Undeniably, this period(2003 to 2009) is very important because key and baseline approaches were intro-duced, such as: CorrLDA, Author-Topic Model , DTM and , RTM etc
    7. In summary, this paper makes four main contributions:–We investigate scholarly articles (from 2003 to 2016) which are related to TopicModeling based on LDA to discover the research development, current trends andintellectual structure of topic modeling based on LDA.–We investigate topic modeling applications in various sciences.–We summarize challenges in topic modeling, such as image processing, Visualiz-ing topic models, Group discovery, User Behavior Modeling, and etc.–We introduce some of the most famous data and tools in topic modeling