- Jun 2021
-
dhsi2022.jonreeve.com dhsi2022.jonreeve.com
-
Our approach builds on these studies by interpreting the dimensions of embedding models as representative of meaningful cul-tural categories rather than simply biases, distortions, or deficits in the semantic system. We then use these dimensions as tools to illu-minate complex cultural relations associated with class in a given social context, across contexts, and over time. More broadly, this article is the first to specifically demonstrate the utility of word embedding models for sociological and cultural inquiry.
This whole section offers a compelling approach to push this method further into the field of humanistic questions.
-
Word embedding algorithms input large collections of digitized text and output a high-dimensional vector-space model1 in which each unique word is represented as a vector in the space (Mikolov, Yih, and Zweig 2013; Pennington, Socher, and Manning 2014). This means each word appearing in the analyzed documents is ascribed a set of coordinates that fix its loca-tion in a geometric space in relation to every other word. Words are positioned in this space based on their surrounding “context” words in the text, such that words sharing many con-texts are positioned near one another, and words that inhabit different linguistic contexts are located farther apart. Previous work with word embeddings in computational linguistics shows that words frequently sharing contexts, and thus located nearby in the vector space, tend to share similar meanings
Clear methods statement. Useful for thinking through how to use and present this work in our own research.
-
-
web.stanford.edu web.stanford.edu6.pdf4
-
Either the PPMI model or the tf-idf model can be used to compute word simi-larity, for tasks like finding word paraphrases, tracking changes in word meaning, orautomatically discovering meanings of words in different corpora. For example, wecan find the 10 most similar words to any target wordwby computing the cosinesbetweenwand each of theV−1 other words, sorting, and looking at the top 10.
This begins to address the question I posed earlier about the usefulness of tf-idf.
-
In summary, the vector semantics model we’ve described so far represents a targetword as a vector with dimensions corresponding either to to the documents in a largecollection (the term-document matrix) or to the counts of words in some neighboringwindow (the term-term matrix). The values in each dimension are counts, weightedby tf-idf (for term-document matrices) or PPMI (for term-term matrices), and thevectors are sparse (since most values are zero)
Really useful summary.
-
A (rough) graphical demonstration of cosine similarity, showing vectors forthree words (cherry,digital, andinformation) in the two dimensional space defined by countsof the wordscomputerandpienearby. Note that the angle betweendigitalandinformationissmaller than the angle betweencherryandinformation. When two vectors are more similar,the cosine is larger but the angle is smaller; the cosine has its maximum (1) when the anglebetween two vectors is smallest (0◦); the cosine of all other angles is less than 1
This is really helpful for understanding how to read these representations.
-
We will see that this method results in very longvectors that aresparse, i.e. mostly zeros (since most words simply never occur inthe context of others).
It seems that the tf-idf model would be less useful than word2vec. Should we understand it as a foundation for more complex analysis (like word2vec), or is there an application for this more simple function that I'm missing?
-
- Jul 2020
-
www.gutenberg.org www.gutenberg.org
-
a thin blue column of smoke rose.
compare moments of smoke throughout; along these lines: also light, fragrance
-
The house itself was of timbers Hewn from the cypress-tree, and carefully fitted together. Large and low was the roof;
could embed "acadian" home here--juxtapose from previous visions of Acadian houses
-
On the banks of the Têche are the towns of St. Maur and St. Martin. There the long-wandering bride shall be given again to her bridegroom, There the long-absent pastor regain his flock and his sheepfold. Beautiful is the land, with its prairies and forests of fruit-trees; Under the feet a garden of flowers, and the bluest of heavens Bending above, and resting its dome on the walls of the forest. They who dwell there have named it the Eden of Louisiana."
"Eden of Louisiana"--religious imagery throughout; speaks to religious persecution
-
Silent at times, then singing familiar Canadian boat-songs, Such as they sang of old on their own Acadian rivers, And through the night were heard the mysterious sounds of the desert, Far off,—indistinct,—as of wave or wind in the forest, Mixed with the whoop of the crane and the roar of the grim alligator.
layered sonic landscape
-
we never have sworn them allegiance!
could expound on the question of allegiance
-
Michael the fiddler was placed, with the gayest of hearts and of waistcoats.
a place to perhaps insert sound
-
Merrily, merrily whirled the wheels of the dizzying dances Under the orchard-trees and down the path to the meadows; Old folk and young together, and children mingled among them. Fairest of all the maids was Evangeline, Benedict's daughter! Noblest of all the youths was Gabriel, son of the blacksmith!
Evangeline and Gabriel betrothed with song and dance
-
"Safer are we unarmed, in the midst of our flocks and our cornfields, Safer within these peaceful dikes, besieged by the ocean, Than our fathers in forts, besieged by the enemy's cannon.
juxtaposing the natural landscape with the forged industrial might of the British invaders
-
-
www.nytimes.com www.nytimes.com
-
There are, one should note, many of the latter, but they always seem about to drown in the shrill orthographical chaos surrounding them, complaints often written by those who look forward to the demise of critics — and editors — with a populist glee.
Interesting in the context of public annotation--what are the benefits/limitations of this kind of "populist glee" and must one replace the other?
-
- Sep 2019
-
journalofdigitalhumanities.org journalofdigitalhumanities.org
-
Each of these approaches—data as text, artifact, and processable information—allow one to produce or uncover evidence that can support particular claims and arguments. Data is not in and of itself a kind of evidence but a multifaced object which can be mobilized as evidence in support of an argument.
Data can be mobilized as evidence, but is not evidence on its own.
-
- Jul 2019
-
jessicadoeshistory.com jessicadoeshistory.com
-
Cavalcade of the American Negro
Hey Alyssa!
-