289 Matching Annotations
  1. Jun 2019
    1. he most convincing historical argument revolves around preferential attachment, or rich-get-richer. In a friendship network, this means that those who have a lot of friends are at a higher risk of making even more friends; they have a larger circle that can, in turn, introduce them to even more people. This positive feedback loops continues until a few people dominate the landscape in terms of connectivity. This is also true of citation networks, in that articles that already are highly cited are more likely to be seen by other scholars, and cited again in turn.

      helpful examples

    2. A historian may wish to see the evolution of transitivity across a social network to find the relative importance of introductions in forming social bonds.

      what sort of historical analysis would need this kind of evidence?

    3. Unsurprisingly, in the early modern Republic of Letters, we see this as well. The probability that a person will correspond with another in the network increases if they have a history of previous contact.

      helpful example

    4. For the historian, it is unlikely that the diameter will be more useful than the average path length on most occasions.

      helpful hint

    1. the vital connective tissue

      structures of networks to focus on here

    2. There is a tendency when using graphs to become smitten with one’s own data. Even though a graph of a few hundred nodes quickly becomes unreadable, it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the creator’s data is ‘complex’ fits just fine with the creator’s own interpretation of it. Graphs have a tendency of making a data set look sophisticated and important, without having solved the problem of enlightening the viewer.[13]

      yup you can get a "gee whiz" #dataviz pretty easily, but persuading someone of a historical argument with one is much harder

    1. Besides this special case of two opposing colors, it is best to avoid using hue to represent quantitative data.

      So many moving parts in crafting these ... do Digital Historians have the option to receive feedback on their visualizations? Do peer reviewers make comments regarding colors employed and/or choice of visualization?

    1. While my description of the chart does describe the trends accurately, it does not convey the sheer magnitude of difference between earlier and later years as covered by dissertations, nor does it mention the sudden drop in dissertations covering periods after 1970.

      Visualizations present data in a readable way and they can drive home points more so than a text description.

    2. an additional layer in the hermeneutic process of hypothesis formation

      which means what?

    3. exactly what data are available and how they interconnect.

      refresh your memory? What is data? How does "history" getting translated into "data"?

    1. Networks, blunt methods for considering connections,do not provide a way to reflect on the quality of

      This study further stresses the importance of "knowing the history" of the subject matter you're researching. The author knew that Brown's Manifesto did not compare to more influential/ significant feminist manifestos. However, the large amount of mentions in Deepwell's anthology may lead someone who's less knowledgeable about this subject matter to make speculations regarding the importance of Manifesto for the Feminist Artist.

    2. trans-generational and transnational exchange

      Revealed through this textual analysis. Challenge preconceived ideas of feminist art.

    3. For example, Correspondence Courseincludes no letters to or from Austrian performance artist and filmmaker, Valie Export, although the two artists met in 1970, but Export ismentioned several times by Schneemann.25

      Textual analysis further broadens Schneemann's social network. Thus, revealing her connections and influences.

    1. What we really want here of course is a visualization that combines all the things, but I’ve resisted creating one for now. The complex historical questions of who gets counted when we count in histories of women’s liberation exists because data reduces people’s lived experiences to columns on a spreadsheet.

      Hypothetically speaking, if you were to create a visualization... how would you identify a group like Poor Black Women' Study Group if they use different names?

    2. I share this not to reveal my own sloppy data, but to highlight the difficulties of doing this kind of visualization.

      Goes back to the importance of picking the right visualization that we discussed in class.

    3. I appreciate how Digital History scholarship links you to website pages to gain more knowledge on a proposed topic

    1. We cannot rely only on the computer-driven groups to use in analyzing texts.  The next step is to look at the texts that contain repeating word patterns and conduct a close reading to see what we can learn about the topic. Plotting the topic over time enables us to locate trends in how important the topic was to the author, or when we compare them with other authors, we can investigate differences in the ways that two authors valued these topics or the different ways that they expressed themselves.

      need for humans and computers to analyze text

    2. What topic modeling can offer a historian is an objective snapshot of the content of the collection.

      objective and maybe random without context?

    1. Even more significantly, topic modeling allows us a glimpse not only into Martha’s tangible world (such as weather or housework topics), but also into her abstract world.

      this was an issue with antconc interesting to see it used here.

    2. Yet this pattern bolsters the argument made by Ulrich in A Midwife’s Tale, in which she points out that the first half of the diary was “written when her family’s productive power was at its height.” (285) As her children married and moved into different households, and her own husband experienced mounting legal and financial troubles, her daily burdens around the house increased. Topic modeling allows us to quantify and visualize this pattern, a pattern not immediately visible to a human reader.

      interesting match with current scholarship. historians need to be well versed in historiography to interpret topic modeling visualizations.

    3. In essence, topic modeling accurately recognized, in a mere 55 words (many abbreviated into a jumbled shorthand), the dominant theme of that entry:

      does the medium of text analyzed have anything to do with this? presumably the diary entries are not long and may not have had several topics per entry

    4. Instead, the program is only concerned with how the words are used in the text, and specifically what words tend to be used similarly.

      does that then help standardize and catch words that might be misspelled or have alternative spellings?

    5. MALLET generated a list of thirty topics comprised of twenty words each, which I then labeled with a descriptive title.

      so the author had to assign the topic then and the software just made the word clusters? ask in class

    6. it worked. Beautifully

      how many topics did the author ask for the computer to generate then? it seems like topic modeling needs some refinement first

    7. topic modeling, a method of computational linguistics that attempts to find words that frequently appear together within a text and then group them into clusters.

      definition

    8. EMOTION words

      Crossroads of tech, language, and humanities.

    9. When its entry scores are aggregated into months of the year, it shows exactly what one would expect over the course of a typical year:

      Human intervention coupled with the results of topic modeling.

    1. When you encounter someone else’s topic model, do not accept at first glance. Rather, to understand the potentials and pitfalls, you must be aware of how the tools work and their limitations.

      important to note! topic models should not just be accepted

    2. As you read that essay, consider for yourself the choices we have made in how we perform the topic model, and in how we visualize the results. How justified are we in those choices?

      This is key.

    1. We mention this here to highlight the speed with which the digital landscape of tools can change. When we initially wrote about Paper Machines, we were able to topic model and visualize John Adams diaries, scraping the page itself using Zotero. When we revisited that workflow a few months later, given changes that we had made to our own machines (updating software, moving folders around and so on, and changes to the underlying html of the John Adams Diaries website), it –our workflow- no longer worked! Working with digital tools can sometimes make it necessary to not update your software! Rather, keep in mind which tools work with what versions of other supporting pieces.

      important note

    1. Different tools give different results even if they are all still ‘topic modeling’ writ large. This is a useful way to further understand how these algorithms shape our research, and is a useful reminder to always be up front about the tools that you are using in your research.

      the method matters!!

    2. Because the STMT is most useful for us as historians when we have documents with dates, let us consider the workflow for getting a digitized diary off a website and into the STMT. You may wish to skip this section; you can obtain the scraped data csv from our website.

      see why I wanted the Davis Diary by date?

    1. Available in a Google Code repository at https://code.google.com/p/topic-modeling-tool/, the GTMT provides quick and easy topic model generation and navigation.

      link didn't work for me

    2. our suspicions are confirmed:

      is this a necessary step?

    3. remove stopwords, normalize text by standardizing case, and tweak your iterations, size of topic descriptors, and the threshold at which you want to cut topics off

      what does all this mean? If you don't know, don't do it.

    1. In fact there is a danger in using topic models as historical evidence; they are configurable and ambiguous enough that no matter what you are looking for, you just might find it. Remember, a topic model is in essence a statistical model that describes the way that topics are formed. It might not be the right model for your corpus. It is however a starting point, and the topics that it finds (or fails to find) should become a lens through which you look at your material, reading closely to understand this productive failure. Ideally, you would then re-run the model, tweaking it so that it better describes the kind of structure you believe exists.

      topic models are not the best piece of evidence as they can argue anything it seems. makes sense when you consider that the computer does not find/know the topic like a historian would when analyzing a source

    2. Leave a comment on paragraph 66 0 There is a fundamental difficulty however. When we began looking at the Gettysburg Address, Hollis was instructed to look for two topics that we had already named ‘war’ and ‘governance’. When the computer looks for two topics, it does not know beforehand that there are two topics present, let alone what they might mean in human terms. In fact, we as the investigators have to tell the computer ‘look for two topics in this corpus of material’, at which point the machine will duly find two topics. At the moment of writing, there is no easily-instantiated method to automatically determine the ‘best’ number of topics in a corpus, although this will no doubt be resolved. For the time being, the investigator has to try out a number of different scenarios to find out what’s best. This is not a bad thing, as it forces the investigator continually to confront (or even, close-read) the data, the model, and the patterns that might be emerging.

      the computer needs to go through the process several times to find topics and refine those words into the topic categories

    3. In fact there is a danger in using topic models as historical evidence; they are configurable and ambiguous enough that no matter what you are looking for, you just might find it.

      A starting point for research. Not the be-all, end-all. The historian shapes their topic model to fit the historical questions they are posing.

    1. it is possible to decompose from the entire collection of words the original distributions held in those bags and buckets

      but isn't it possible then that there could be a lot of overlap with topics that are related but not what the author intended to use.

    2. f you are a literary scholar, you will understand what a ‘topic’ might mean perhaps rather differently than how a librarian might understand it,

      what does "topic" mean to a historian?

    3. it is possible to decompose from the entire collection of words the original distributions held in those bags and buckets.

      so how is this different than what we did with Antconc?

    4. What is a topic, anyway?

      This paragraph highlights the ways in which the smallest thing,such as the meaning of topic, can look different to programmers than historians, and how a better understand of tech and humanities together can make a big difference in digital history projects.

    1. Topic modeling can offer us some new groupings of documents that we might have overlooked, and it will give us the capacity to analyze Sanger’s rhetoric over time, looking for key changes. An example might be the belief among women’s historians that Sanger abandoned her feminist rationales for birth control in the late 1910s and early 1920s as she sought support from experts in the fields of medicine, social work and eugenics.

      Tech, patterns, and change overtime and language. Identifying key changes.

    2. The most descriptive label I could assign this topic would be EMOTION – a tricky and elusive concept for humans to analyze, much less computers. Yet MALLET did a largely impressive job in identifying when Ballard was discussing her emotional state. How does this topic appear over the course of the diary?

      "Objective" translation of human emotion into tech,?

    3. What topic modeling can offer a historian is an objective snapshot of the content of the collection.  Rather than relying on our own readings of documents to combine them together into subject categories, we look instead to the words that appear together most frequently and then label those words in ways that make sense to us. 

      If our data is based of our individual inquiries and interests, can it really be objective? What does this really mean in this context?

    1. Sometimes it might be necessary to break one of these csv files into separate text files, in order to do the next stage of your analysis. For instance, imagine that you had scraped John Adams’ diaries into a single csv file, but what you really wanted to know was the way his views on governance changed over time. This might be a question for which topic modeling (see chapter four) could be well suited; many topic modeling tools require each document (here, a diary entry) to be in its own text file.

      Isn't this what we discussed last week? Give it a go!

    1. Breaking a CSV file into separate txt files

      oh hai isn't this what we discussed in the last class? GIVE IT A GO!

    1. Yet we believe that for all the importance of Big Data, it does not offer any change to the fundamental questions of historical knowing facing historians.

      Only changes how we get answers?

    1. Yet with such a visualization the main downside becomes clear: we lose context. Who are the protagonists? Who are the villains? As adjectives are separated from other concepts, we lose the ability to derive meaning. For example, a politician speaks of “taxes” frequently: but from a word cloud, it is difficult to learn whether they are positive or negative references. With these shortcomings in mind, however, historians can find utility in word clouds.

      I had this idea in mind the entire time I read this section on Word cloud. I think this is a really important consideration and which does show the limitations of wordcloud.

    2. But the changing words are useful.

      comparative word clouds might be useful to show change over time for context then

    3. they are a very useful entryway into the world of basic text mining.

      then are the word clouds the tool of text mining here?

    4. Having large datasets

      what do we mean by data sets for word clouds? is it information that's been scraped from a web page or just lines of text that's then analyzed?

    5. word cloud. In brief, they are generated through the following process. First, a computer program takes a text and counts how frequent each word is. In many cases, it will normalize the text to some degree, or at least give the user options: if “racing” appears 80 times and “Racing

      Word cloud has its benefits, but it is hard to see past the cons. I think these visualizations are only beneficial to an audience that has context.

    6. It also represents the inversion of the traditional historical process: rather than looking at documents that we think may be important to our project and pre-existing thesis, we are looking at documents more generally to see what they might be about. With Big Data, it is sometimes important to let the sources speak to you, rather than looking at them with pre-conceptions of what you might find.

      I found this statement striking since it serves the purpose of telling historians how to make sense of the data they are working with.

    7. In brief, they are generated through the following process. First, a computer program takes a text and counts how frequent each word is. In many cases, it will normalize the text to some degree, or at least give the user options: if “racing” appears 80 times and “Racing” appears 5 times, you may want it to register as a total of 85 times to that term.

      I really like the way this is all broken down. I found this page very explanatory in a user friendly way for someone with a limited knowledge of digital functions.

    1. One in particular that Stray mentions is called ‘Tabula’, which can be used to extract tables of information from PDFs, such as may be found in census documents.

      useful to know

    1. The vocabulary of regular expressions is pretty large, but there are many cheat sheets for regex online (one that we sometimes use is http://regexlib.com/CheatSheet.aspx. Another good one is at http://docs.activestate.com/komodo/4.4/regex-intro.html)

      regex online cheat sheets

    2. Regular expressions can be mixed, so if you wanted to find words only matching “cat”, no matter where in the sentence, you’d search for ¶ 19 Leave a comment on paragraph 19 0 \bcat\b ¶ 20 Leave a comment on paragraph 20 0 which would find every instance. And, because all regular expressions can be mixed, if you searched for ¶ 21 Leave a comment on paragraph 21 0 \bcat|dog\b

      the work around for the program's need to taker everything literally

    3. The astute reader will have noticed a problem with the instructions above; simply replacing every instance of “dog” or “cat” with “animal” is bound to create problems. Simple searches don’t differentiate between letters and spaces, so every time “cat” or “dog” appear within words, they’ll also be replaced with “animal”. “catch” will become “animalch”; “dogma” will become “animalma”; “certificate” will become “certifianimale”. In this case, the solution appears simple; put a space before and after your search query, so now it reads: ¶ 9 Leave a comment on paragraph 9 0 dog | cat ¶ 10 Leave a comment on paragraph 10 0 With the spaces, “animal” replace “dog” or “cat” only in those instances where they’re definitely complete words; that is, when they’re separated by spaces.

      program works very literally

    4. When you type the vertical bar on your keyboard (it looks like |, shift+backslash on windows keyboards), which means ‘or’ in regular expressions. So, if your query is dog|cat and you press ‘find’, it will show you the first time either dog or cat appears in your text. Open up a new file in your editor and write some words that include ‘dog’ and ‘cat’ and try it out.

      helpful how to

    5. In addition to the basics provided here, you will also be able to simply search regular expression libraries online: for example, if you want to find all postal codes, you can search “regular expression Canadian postal code” and learn what ‘formula’ to search for to find them

      a way to learn the lexicon=good

    6. Regular expressions can often be used right inside the ‘Find and Replace’ box in many text and document editors, such as Notepad++ on Windows, or TextWrangler on OS X. You cannot use regex with Microsoft Word, however!

      important to note

    7. a regular expression is just a way of looking through texts to locate patterns. A regular expression can help you find every line that begins with a number, or every instance of an email address, or whenever a word is used even if there are slight variations in how it’s spelled. As long as you can describe the pattern you’re looking for, regular expressions can help you find it. Once you’ve found your patterns, they can then help you manipulate your text so that it fits just what you need.

      definition

    1. McGill University servers

      so is it tied to this university?

    2. load text or pdf files into the system

      thats helpful to know that it accepts both text and pdfs files

  2. www.themacroscope.org www.themacroscope.org
    1. The other possibilities are even more exciting. The Concordance Plot traces where various keywords appear in files, which can be useful to see the overall density of a certain term. For example, in the below visualization of newspaper articles, we trace when frequent media references to ‘community’ in the old Internet website GeoCities declined (figure 3.6)

      so does each of those bars represent one year of newspaper articles and the black lines are then where the word community is used in chronological time?

    2. import files

      do they have to be text files only?

    1. visible data work and what remains as invisible labor.

      Social consideration: Many people forget human labor goes into digital production and services.

    2. The history and development of how we became data subjects in cultures of data are just now beginning to be told by historians and sociologists.

      Collection of data is a social practice with a history of its own.

    3. "era of data" with sensor networks and 5thgeneration mobile networks, as personal computing devices and internet saturation become tighter and tighter in our homes, institutions, and public spaces.

      Data can depends on devices and technology and therefore class, education, race, family, etc.

    4. . But as the following contributions show, data is often characterized as a natural resource out in the wild to be discovered and sourced, with all the requisite taming of nature, including the frontierism metaphors of cowboys, lumberjacks, and gold rush miners.

      Data is created and categorized by human actions,processes, and ideology.

    5. Creators' agency over data that has been created, collected, and managed is thus central to understanding the datafication of culture. Indeed, calls for agency over our individual data footprints and practices of collection exist, but more methods for understanding how individuals are interpolated through sociomaterial practices of data collection as subjects are still needed.17

      Feminist Research Practices and Digital Archives stresses the importance of maintaining the trace of labor inherent in digitized items. Acker and Clement call for the trace of labor in data collection, to better comprehend the datafication of culture.

    6. But we know that data is not just given, it is taken up by people and given forms, standards, names, putting it into relationships with cultural practices.4Data is not nature, waiting to be tamed; it is always already a cultural product.5

      Interesting to see how the authors attribute so much agency to data. It further solidifies how humans shape, and are shaped by data.

    1. it provides a quite quick and relatively painless way to get a broad sense of what is going on within your documents.
    2. it provides a quite quick and relatively painless way to get a broad sense of what is going on within your documents.
    3. Thus, you could use a spreadsheet program to create bar charts of the count of documents with an ‘architecture’ tag or a ‘union history’ tag, or ‘children’, ‘women’, ‘agriculture,’ etc. We might wonder how plaques concerned with ‘children’, ‘women’, ‘agriculture’, ‘industry’, etc might be grouped, so we could use Overview’s search function to identify these plaques by search for a word or phrase, and applying that word or phrase as a tag to everything that is found. One could then visually explore the way various tags correspond with particular folders of similar documents.[5]
    4. Why does this division exist? That would be an interesting question to explore.
    5. Why does this division exist? That would be an interesting question to explore.

      Importance of categorization and choices made with organization, programming, etc.

    1. The decision by Mario Gonzalez illustrates how individuals create their own life-narrative

      Re-writing their own history, so the (alleged) "negative" parts are forgotten and erased.

    2. However, having the opportunity to build up a greater understanding of an individual’s lifestyle and their social networks brings with it ethical concerns. These concerns are increased by their involvement in a criminal activity. Disclosing the identity of a ‘forgotten’ individual and their connection to crime has the potential to cause harm. Therefore, careful thought is needed about how this evidence is presented when sharing research findings, and in this digital age this may not just be in the form of a printed academic journal.

      Which would lead to researchers policing themselves and questioning if their work would harm an individual and/or their descendants.

    3. This study wanted to consider the period from a new perspective – that of the consumer. Of interest was:

      The "hidden population" may have wanted to maintain their anonymity - or at least that's what the Mario Gonzalez case suggests.

    1. Graphic designers intricate two page spreads are chopped in half when digitisation occurs at the page level. Layouts that carefully juxtaposed articles, poems, and photographs on a page are eradicated when individual items are digitised as separate files. An artist’s delicate work loses nuances when contrast levels are set to prioritise text legibility.

      Much is lost when information is taken out of its initial context. I appreciate the Emilie Davis' diaries because it juxtaposes the diary page scans next to the transcriptions.

    2. I adopt instead the more specific term, ‘digital archival environment’, to describe accessing online digitised surrogates of materials taken from archives.

      Brick and mortar archives and digital archives need to abide by ethical standards. Relevant (alive) individuals and their descendants should be contacted before publishing this information on the internet.

    3. How can a researcher determine whether individuals who appear in a digital archival environment have consented to this and what controls should they look for to mitigate this increased exposure? Researchers can check for copyright information, investigate searchability settings, and look for procedures that enable individuals to have materials taken down.

      Is this a firm rule in the Digital History community or is it more of a suggestion?

  3. May 2019
    1. comes to questions of privacy or evolving identities

      digital history ethics

    2. n this chapter we explore how history-making practices of radical and lesbian feminists offer a model of cultural history preservation and transmis-sion for those of us who create digital resources.

      "who create digital records" note they do NOT say "digital archive"

    1. the macroscope makes it easier to grasp the incredibly large. It does so through a process of compression, by selectively reducing complexity until once-obscure patterns and relationships become clear. Often, macroscopes produce textual abstractions or data visualizations in lieu of direct images.[1]

      working definition of macroscope

    1. we believe that we are now on the cusp of a third revolution in computational history.

      What do the authors mean by the “third wave” of computational engagement? How does it differ from the prior two?

  4. www.themacroscope.org www.themacroscope.org
    1. big is in the eye of the beholder

      What is Big Data and why does it matter for historians?