98 Matching Annotations
  1. Jun 2019
    1. This article will demonstrate how the mathematical tools employed by network scientists offer valuable ways of understanding the development of underground religious communities in the sixteenth century, as well as providing different approaches for historians and literary scholars working in archives.

      need for proposal

    1. The strength of weak ties has been hypothesized as a driving force behind the flourishing of science in 17th century Europe.[3] Political exile, religious diaspora, and the habit of young scholars to travel extensively, combined with a relatively inexpensive and fast postal system, created an environment where every local community had weak ties extending widely across political, religious, and intellectual boundaries. This put each community, and every individual, at higher risk for encountering just the right serendipitous idea or bit of data they needed to set them on their way. Weak ties are what make the small community part of the global network.

      history example

    2. Careful readers will have noted that the definition of a weak tie is curiously similar to that of a bridge. This dichotomy, the weakness of a connection alongside the importance of a bridge, has profound effects on network dynamics.

      interesting to note

    3. a hub is a node without which the path between its neighbours would be much larger, and a bridge is an edge which connects two otherwise unconnected communities.

      reference definition

    4. The further back in history one goes, the less the globe looks like a small world network. This is because travel and distance constraints prevented short connections between disparate areas.

      technology (for communication) can affect what a historic network looks like

    5. A historian may wish to see the evolution of transitivity across a social network to find the relative importance of introductions in forming social bonds.

      final project: a useful way to look at Burrough's letters of introductions to Morton

    6. These global metrics are most useful when measured in comparison to other networks; early modern and present day social networks both exhibit scale-free properties, but the useful information is in how those properties differ from one another.

      important to note

    7. As opposed to bibliographic coupling networks, co-citation networks connect articles not by the choices their authors make, but by the choices future authors make about them.

      so this pairing is less about content and more about use. ex- you can frequently cite opposing articles about a particular topic together

    8. In an evolving network of correspondents, if Alice writes to Bob, and Bob to Carol, we can ask what the likelihood is that Alice will eventually write to Carol (thus again closing the triangle). This tendency, called triadic closure, can help measure the importance of introductions and knowing the right people in a letter-writing community.

      final project: example of Morton, Burroughs, and Maclure

    9. The smallest unit of meaningful analysis in a network is a dyad, or a pair of nodes connected by an edge between them (figure 6.4). Without any dyads, there would be no network, only a series of disconnected isolates. Dyads are, unsurprisingly, generally discussed in terms of the nature of the relationship between two nodes. The dyad of two medieval manuscripts may be strong or weak depending on the similarity of their content, for example. Study of dyads in large networks most often revolve around one of two concepts: reciprocity, whether a connection in one direction is reciprocated in the other direction, or assortativity, whether similar nodes tend to have edges between them.

      reference definitions

    10. Create a sliding window of time, e.g. a week or five years, and analyze successive snapshots over that period. Each snapshot then only contains data from that time window. The network changes drastically in form over time.

      reference for final project: this might be a useful way to look at the Morton letters (to examine those by year)

    11. f individual attributes

      customization features for your fixed points of study (other aspects of data to include)

    12. Edges connect nodes

      reference: nodes are fixed points and edges connect them

    13. Entities are called nodes (figure 6.1) and the relationships between them are called edges (figure 6.2), and everything about a network pivots on these two building blocks.

      definitions!

    14. Despite their name, networks are dichotomous, in that they only give you two sorts of things to work with: entities and relationships

      important to note

    1. The following chapter, beyond teaching the basics of what networks are and how to use them, will also cover some of the many situations where networks are completely inappropriate solutions to a problem.

      remember sometimes a dataviz is not what you actually need

    2. The entities being connected can be articles, people, social groups, political parties, archaeological artefacts, stories, and cities; citations, friendships, people, affiliations, locations, keywords, and ship’s routes can connect them. The results of a network study can be used as an illustration, a research aid, evidence, a narrative, a classification scheme, and a tool for navigation or understanding.

      variety of use for this method/result

    3. In this case, networks were the subject of study rather than used as evidence, in an effort to see the effects of political change on power structures.

      important to note network analysis is not always just the result.

    4. Studies of this sort pave the way for more exploratory network analyses; if the analysis corroborates the consensus, then it is more likely to be trustworthy in situations where there is not yet a consensus.

      hypothesis before network analysis! not the other way around!

    5. Their study looks in 280 letters written by Cicero; the network generated was not that of whom Cicero corresponded with, but of information generated from reading the letters themselves.

      interesting idea. the body of data is the content of letters not just the sending information

    6. Network approaches can be particularly useful at disentangling the balance of power, either in a single period or over time. A network, however, is only as useful as its data are relevant or complete. We need to be extremely careful when analyzing networks not to read power relationships into data that may simply be imbalanced.

      need to know what the source base is and how it could be limited before making claims about the data results. can't just blindly accept a dataviz/ network analysis

    7. A citation analysis by White and McCann looking at an eighteenth-century chemistry controversy took into account the hierarchical structure of scientific specialties.[2] The authors began with an assumption that if two authors both contributed to a field, the less prominent author would always get cited alongside the more prominent author, while the more prominent author would frequently be cited alone. One scientist is linked to another if they tend to be subordinate to (only cited alongside of) that other author.

      need some contextual or background knowledge when looking at the network analysis of citations to make sense of this hierarchy

    8. We are currently enjoying one such resurgence, not incidentally co-developing along with the popularity of the Internet, a network backbone connecting much of the world to one system.

      new popularity in network analysis tied to 3rd wave of DH

    1. What we really want here of course is a visualization that combines all the things, but I’ve resisted creating one for now. The complex historical questions of who gets counted when we count in histories of women’s liberation exists because data reduces people’s lived experiences to columns on a spreadsheet.

      ethical considerations of digital history

    2. I took some important titles, starting in 1975 running up to 1981, digitized the acknowledgements, and pulled the names by hand into a spreadsheet. Of the 435 names, the common network reduces to this.

      method

    3. Turning to a print expression of the movement that is far more grass roots than mass market anthologies, I decided to look for Robinson in the many periodicals that developed out of women’s liberation. Reveal Digital has provided me with uncorrected OCR, machine-readable corpora consisting of over 6000 issues of periodicals from the Left. Here I searched all variations of Robinson’s name. I visualized this metadata and then for comparison repeated the process with the names of the other two black women who overlapped anthologies to visualize the spread of their writing.

      method

    4. Back to my spreadsheet I went where I discovered that variations in the title of the essay and in the author credits obfuscated the connections, as well as those for the other two essays by black women, Frances M. Beal and Maryanne Weathers. I share this not to reveal my own sloppy data, but to highlight the difficulties of doing this kind of visualization.

      visualization helped reveal issues/flaw in data set

    5. visualization below.

      is this referring to the network plot further below or is it missing?

    6. Using the BYU Corpus Interface for Google Books I scraped the metadata for any references to “Redstockings Manifesto” or “A Historical and Critical Essay for Black Women”  to create the visualization below.

      describing method and software- important

    7. Scraping from online catalogues and then digitizing when I had to, I took the table of contents, separated titles and authors, and put them into a spreadsheet that I then pulled into Palladio where I explored the relationships between both authors and essays as they overlapped.

      outlining method

    1. chartjunk

      love the name and how true it is to what this actually means

    2. Most visualization software do not automatically create legends, and so they become a neglected afterthought.

      legends are necessary for deciphering visualizations, its a shame the software often does not include a legend automatically

    3. If choosing the data to go into a visualization is the first step, picking a general form the second, and selecting appropriate visual encoding the third, the final step for putting together an effective information visualization is in following proper aesthetic design principles.

      general template

    1. Your audience should influence your choice of color palette, as readers will always come to a visualization with preconceived notions of what your graphic variables imply.

      audience important to design decisions, cannot assume familiarity

    1. These three variables should be used to represent different variable types. Except in one circumstance, discussed below, hue should only ever be used to represent nominal, qualitative data. People are not well-equipped to understand the quantitative difference between e.g. red and green. In a bar chart showing the average salary of faculty from different departments, hue can be used to differentiate the departments. Saturation and value, on the other hand, can be used to represent quantitative data. On a map, saturation might represent population density; in a scatterplot, saturation of the individual data points might represent somebody’s age or wealth. The one time hue may be used to represent quantitative values is when you have binary diverging data. For example, a map may show increasingly saturated blues for states that lean more Democratic, and increasingly saturated reds for states that lean more Republican. Besides this special case of two opposing colors, it is best to avoid using hue to represent quantitative data.

      important to note

    2. The nature of each of these data types will dictate which graphic variables may be used to visually represent them. The following section discusses several possible graphic variables, and how they relate to the various scales of measure.

      guidelines certain types of data for certain visualizations

    3. The art of visual encoding is in the ability to match data variables and graphic variables appropriately. Graphic variables include the color, shape, or position of objects in the visualization, whereas data variables include what is attempting to be visualized (e.g. temperature, height, age, country name, etc.)

      visual encoding

    1. There is no right visualization. A visualization is a decision you make based on what you want your audience to learn. That said, there are a great many wrong visualizations. Using a scatterplot to show average rainfall by country is a wrong decision; using a bar chart is a better one. Ultimately, your choice of which type of visualization to use is determined by how many variables you are using, whether they are qualitative or quantitative, how you are trying to compare them, and how you would like to present them. Creating an effective visualization begins by choosing from one of the many appropriate types for the task at hand, and discarding inappropriate types as necessary.

      guidelines

    2. The reasons behind visualizing a network can differ, but in general, visualizations of small networks are best at allowing the reader to understand individual connections, whereas visualizations of large networks are best for revealing global structure.

      !

    3. It is important to remember that stylistic choices can deeply influence the message taken from a visualization. Horizontal and radial trees can represent the same information, but the former emphasizes change over time, whereas the latter emphases the centrality of the highest rung on the hierarchy. Both are equally valid, but they send very different messages to the reader.

      medium maters

    4. Whereas the previous types of visualizations dealt with data that were some combination of categorical, quantitative, and geographic, some data are inherently relational, and do not lend themselves to these sorts of visualizations. Hierarchical and nested data are a variety of network data, but they are a common enough variety that many visualizations have been designed with them in mind specifically. Examples of this type of data include family lineages, organizational hierarchies, computer subdirectories, and the evolutionary branching of species.

      hierarchical data visualizations for showing relational values

    5. Leave a comment on paragraph 47 0 In the humanities, map visualizations will often need to be of historical or imagined spaces. While there are many convenient pipelines to create custom data overlays of maps, creating new maps entirely can be a gruelling process with few easy tools to support it. It is never as simple as taking a picture of an old map and scanning it into the computer; the aspiring cartographer will need to painstakingly match points on an old scanned map to their modern latitude and longitude, or to create new map tiles entirely.

      good point. physical space and locations of towns or landmarks do not go unchanged over time. look at the sand creek example. the old map had the creek in a completely different location as nature had shifted the path of the creek over several decades.

    6. Keep in mind that often, even if you plan on representing geographic information, the best visualizations may not be on a map. In this case, unless you are trying to show that the higher density of populous areas is in the Eastern U.S., you may be better served by a bar chart, with bar heights representative of population size. That is, the latitude and longitude of the cities is not particularly important in conveying the information we are trying to get across.

      it seems like there are always alternative ways to visualize data

    7. Even these seemingly straightforward representations are loaded with significant choices, as laying two-dimensional coordinates onto a 3D world means making complicated choices around what map projection to use.

      maps still need interpretation and cannot be accepted as absolute

    8. Statistical charts are likely those that will be most familiar to any audience. When visualizing for communication purposes, it is important to keep in mind which types of visualizations your audience will find legible. Sometimes the most appropriate visualization for the job is the one that is most easily understood, rather than the one that most accurately portrays the data at hand. This is particularly true when representing many abstract variables at once: it is possible to create a visualization with color, size, angle, position, and shape all representing different aspects of the data, but it may become so complex as to be illegible.

      audience is very important when choosing a medium

    9. Our taxonomy is influenced by visualizing.org, a website dedicated to cataloguing interesting visualizations, but we take examples from many other sources as well

      reference site

    10. Often, because of change blindness, dynamic visualizations may be confusing and less informative than sequential static visualizations. Interactive visualizations have the potential to overload an audience, especially if the controls are varied and unintuitive. The key is striking a balance between clarity and flexibility.

      one model does not fit every audience. need to consider the audience when constructing a visualization!

    11. interactive visualizations allow the user to manipulate the graphical variables themselves in real-time

      interactive visualization definition. also sounds interesting with real time manipulation

    12. dynamic visualizations are short animations which show change, either over time or across some other variable

      dynamic visualization definition

    13. Static visualizations are those which do not move and cannot be manipulated

      static visualization definition

    14. A truly “objective” visualization, where the data speak for themselves, is impossible.

      need text and context! a visualization needs some explanation

    15. An information visualization differs from a scientific visualization in the data it aims to represent, and in how that representation is instantiated. Scientific visualizations maintain a specific spatial reference system, whereas information visualizations do not.

      !! important to note

    16. information visualization is the mapping of abstract data to graphic variables in order to make a visual representation.

      information visualization definition

    1. In a public world that values quantification so highly, visualizations may lend an air of legitimacy to a piece of research which it may or may not deserve.

      means that like topic models without context or key explanatory features, this can be misleading.

    2. The right visualization can replace pages of text with a single graph and still convey the same amount of information.

      really?

    3. Uses of information visualization generally fall into two categories: exploration and communication.

      important reference 2 types

    4. This approach to distant reading– that is, seeing where in a text the object of inquiry is densest– has since become so common as to no longer feel like a visualization. Amazon’s Kindle has a search function called X-Ray (figure 5.2) which allows the reader to search for a series of words, and see the frequency with which those words appear in a text over the course of its pages.

      antconc also has a feature somewhat like this, the bar graph of frequency throughout a document or documents. I wonder how different the results are or if antconc is more limited.

    5. Visualizations can also lie, confuse, or otherwise misrepresent if used poorly.

      just like topic models!

    6. Visualization is a method of deforming, compressing, or otherwise manipulating data in order to see it in new and enlightening ways

      visualization definition

    1. We cannot rely only on the computer-driven groups to use in analyzing texts.  The next step is to look at the texts that contain repeating word patterns and conduct a close reading to see what we can learn about the topic. Plotting the topic over time enables us to locate trends in how important the topic was to the author, or when we compare them with other authors, we can investigate differences in the ways that two authors valued these topics or the different ways that they expressed themselves.

      need for humans and computers to analyze text

    2. What topic modeling can offer a historian is an objective snapshot of the content of the collection.

      objective and maybe random without context?

    1. Even more significantly, topic modeling allows us a glimpse not only into Martha’s tangible world (such as weather or housework topics), but also into her abstract world.

      this was an issue with antconc interesting to see it used here.

    2. Yet this pattern bolsters the argument made by Ulrich in A Midwife’s Tale, in which she points out that the first half of the diary was “written when her family’s productive power was at its height.” (285) As her children married and moved into different households, and her own husband experienced mounting legal and financial troubles, her daily burdens around the house increased. Topic modeling allows us to quantify and visualize this pattern, a pattern not immediately visible to a human reader.

      interesting match with current scholarship. historians need to be well versed in historiography to interpret topic modeling visualizations.

    3. In essence, topic modeling accurately recognized, in a mere 55 words (many abbreviated into a jumbled shorthand), the dominant theme of that entry:

      does the medium of text analyzed have anything to do with this? presumably the diary entries are not long and may not have had several topics per entry

    4. Instead, the program is only concerned with how the words are used in the text, and specifically what words tend to be used similarly.

      does that then help standardize and catch words that might be misspelled or have alternative spellings?

    5. MALLET generated a list of thirty topics comprised of twenty words each, which I then labeled with a descriptive title.

      so the author had to assign the topic then and the software just made the word clusters? ask in class

    6. it worked. Beautifully

      how many topics did the author ask for the computer to generate then? it seems like topic modeling needs some refinement first

    7. topic modeling, a method of computational linguistics that attempts to find words that frequently appear together within a text and then group them into clusters.

      definition

    1. When you encounter someone else’s topic model, do not accept at first glance. Rather, to understand the potentials and pitfalls, you must be aware of how the tools work and their limitations.

      important to note! topic models should not just be accepted

    1. We mention this here to highlight the speed with which the digital landscape of tools can change. When we initially wrote about Paper Machines, we were able to topic model and visualize John Adams diaries, scraping the page itself using Zotero. When we revisited that workflow a few months later, given changes that we had made to our own machines (updating software, moving folders around and so on, and changes to the underlying html of the John Adams Diaries website), it –our workflow- no longer worked! Working with digital tools can sometimes make it necessary to not update your software! Rather, keep in mind which tools work with what versions of other supporting pieces.

      important note

    1. Different tools give different results even if they are all still ‘topic modeling’ writ large. This is a useful way to further understand how these algorithms shape our research, and is a useful reminder to always be up front about the tools that you are using in your research.

      the method matters!!

    1. Available in a Google Code repository at https://code.google.com/p/topic-modeling-tool/, the GTMT provides quick and easy topic model generation and navigation.

      link didn't work for me

    1. In fact there is a danger in using topic models as historical evidence; they are configurable and ambiguous enough that no matter what you are looking for, you just might find it. Remember, a topic model is in essence a statistical model that describes the way that topics are formed. It might not be the right model for your corpus. It is however a starting point, and the topics that it finds (or fails to find) should become a lens through which you look at your material, reading closely to understand this productive failure. Ideally, you would then re-run the model, tweaking it so that it better describes the kind of structure you believe exists.

      topic models are not the best piece of evidence as they can argue anything it seems. makes sense when you consider that the computer does not find/know the topic like a historian would when analyzing a source

    2. Leave a comment on paragraph 66 0 There is a fundamental difficulty however. When we began looking at the Gettysburg Address, Hollis was instructed to look for two topics that we had already named ‘war’ and ‘governance’. When the computer looks for two topics, it does not know beforehand that there are two topics present, let alone what they might mean in human terms. In fact, we as the investigators have to tell the computer ‘look for two topics in this corpus of material’, at which point the machine will duly find two topics. At the moment of writing, there is no easily-instantiated method to automatically determine the ‘best’ number of topics in a corpus, although this will no doubt be resolved. For the time being, the investigator has to try out a number of different scenarios to find out what’s best. This is not a bad thing, as it forces the investigator continually to confront (or even, close-read) the data, the model, and the patterns that might be emerging.

      the computer needs to go through the process several times to find topics and refine those words into the topic categories

    3. We should point out that while ‘document’ in everyday use means a diary entry, a single speech, an entire book, for the purpose of data mining, a document could be just every paragraph within that book, or every 1000 words

      important

    1. The essence of a topic model is in its input and its output: a corpus, a collection, of text goes in, and a list of topics that comprise the text comes out the other side.

      what is it basically

    2. it is possible to decompose from the entire collection of words the original distributions held in those bags and buckets

      but isn't it possible then that there could be a lot of overlap with topics that are related but not what the author intended to use.

    1. One in particular that Stray mentions is called ‘Tabula’, which can be used to extract tables of information from PDFs, such as may be found in census documents.

      useful to know

    1. While the changes appear insignificant, it will allow us to turn this index into something that a network analysis program (for instance) could read and make visual sense of. It is, in fact, turning an OCR’d page of text into a csv file!

      !!

    2. The vocabulary of regular expressions is pretty large, but there are many cheat sheets for regex online (one that we sometimes use is http://regexlib.com/CheatSheet.aspx. Another good one is at http://docs.activestate.com/komodo/4.4/regex-intro.html)

      regex online cheat sheets

    3. Regular expressions can be mixed, so if you wanted to find words only matching “cat”, no matter where in the sentence, you’d search for ¶ 19 Leave a comment on paragraph 19 0 \bcat\b ¶ 20 Leave a comment on paragraph 20 0 which would find every instance. And, because all regular expressions can be mixed, if you searched for ¶ 21 Leave a comment on paragraph 21 0 \bcat|dog\b

      the work around for the program's need to taker everything literally

    4. The astute reader will have noticed a problem with the instructions above; simply replacing every instance of “dog” or “cat” with “animal” is bound to create problems. Simple searches don’t differentiate between letters and spaces, so every time “cat” or “dog” appear within words, they’ll also be replaced with “animal”. “catch” will become “animalch”; “dogma” will become “animalma”; “certificate” will become “certifianimale”. In this case, the solution appears simple; put a space before and after your search query, so now it reads: ¶ 9 Leave a comment on paragraph 9 0 dog | cat ¶ 10 Leave a comment on paragraph 10 0 With the spaces, “animal” replace “dog” or “cat” only in those instances where they’re definitely complete words; that is, when they’re separated by spaces.

      program works very literally

    5. When you type the vertical bar on your keyboard (it looks like |, shift+backslash on windows keyboards), which means ‘or’ in regular expressions. So, if your query is dog|cat and you press ‘find’, it will show you the first time either dog or cat appears in your text. Open up a new file in your editor and write some words that include ‘dog’ and ‘cat’ and try it out.

      helpful how to

    6. In addition to the basics provided here, you will also be able to simply search regular expression libraries online: for example, if you want to find all postal codes, you can search “regular expression Canadian postal code” and learn what ‘formula’ to search for to find them

      a way to learn the lexicon=good

    7. Regular expressions can often be used right inside the ‘Find and Replace’ box in many text and document editors, such as Notepad++ on Windows, or TextWrangler on OS X. You cannot use regex with Microsoft Word, however!

      important to note

    8. a regular expression is just a way of looking through texts to locate patterns. A regular expression can help you find every line that begins with a number, or every instance of an email address, or whenever a word is used even if there are slight variations in how it’s spelled. As long as you can describe the pattern you’re looking for, regular expressions can help you find it. Once you’ve found your patterns, they can then help you manipulate your text so that it fits just what you need.

      definition

    1. McGill University servers

      so is it tied to this university?

    2. track words that rise and fall

      nice

    3. load text or pdf files into the system

      thats helpful to know that it accepts both text and pdfs files

  2. www.themacroscope.org www.themacroscope.org
    1. The other possibilities are even more exciting. The Concordance Plot traces where various keywords appear in files, which can be useful to see the overall density of a certain term. For example, in the below visualization of newspaper articles, we trace when frequent media references to ‘community’ in the old Internet website GeoCities declined (figure 3.6)

      so does each of those bars represent one year of newspaper articles and the black lines are then where the word community is used in chronological time?

    2. import files

      do they have to be text files only?

    3. f you have somewhere in the ballpark of 500 or even 1,000 newspaper-length articles you should be able to crunch data and receive tangible results

      thats super useful!

    1. But the changing words are useful.

      comparative word clouds might be useful to show change over time for context then

    2. they are a very useful entryway into the world of basic text mining.

      then are the word clouds the tool of text mining here?

    3. Having large datasets

      what do we mean by data sets for word clouds? is it information that's been scraped from a web page or just lines of text that's then analyzed?

    1. Yet we do need to realize that these tools shape our research: they can occasionally occlude context, or mislead us

      why we need to heavily document our process then #DH8900

  3. May 2019
    1. links to material culture readings on how digitization can lead to loss of physicality as objects for print sources (need to consider things as deliberate choices like text)