55 Matching Annotations
  1. Aug 2017
    1. Experts in the area have argued that the most powerful visualizations are static images with clear legends and a clear point,

      I think its interesting that there is any sort of consensus at all about the "most powerful visualizations." How on earth would one measure that? Do they judge it by emotional response of readers? Or understanding? Or further applications of data? Are visualizations about communication or understanding, or a bit of both?

    1. Keyness reveals that "women" is a statistical significant negative key word, which means male authors used it less frequently than female authors.

      not exactly a surprise! but useful to see that data analysis confirms it

    2. I'd also note where "women" is not present at very high frequency.  What are those items talking about then in a volume about woman suffrage?  

      definitely an interesting question! Maybe it is the nature of language in the time period that the noun "women" wouldn't be mentioned frequently because legal/flowery/expressive language was used throughout? Just a guess

    1. ord breaks (such as “pre-diction”for “prediction”)

      I remember in a previous module the word- breaks drove me crazy! If they were used so often in old newspaper I can definitely see a challenge for OCR...searching for hits of "Lincoln" might not be so accurate if half of the hits were missed because they were typed as "Lin-coln"

    2. 12optical character recognition (OCR). OCR is, at base, a process by which a computer program scans these images and attempts to identify alpha-numeric symbols (letters and numbers) so they can be translated into electronic text. So, for example, in doing an OCR scan of an image of the word “the,” an effective OCR program should be able to recognize the individual “t” “h” and “e” letters, and then save those as “the” in text form. Various versions of this process have been around since the late 1920s, although the technology has improved drastically in recent years. Today most OCR systems achieve a high-level of recognition accuracy when used on printed texts and calibrated correctly for specific fonts

      We have encountered issues with OCR in our previous module though! If the paper was blurred, smudged, wrinkled, had complex fonts or was faded OCR can make errors such as changing "rn" to "m"

    3. it would be difficult for a researcher to know whether a search that produced a small number of search results would indicate few discussions of Lincoln from that era or simply that few relevant resources we

      I didn't think of this problem but it certainly leads to the conclusion that no matter how large the data set is, the researcher has to understand the nature of the data to be able to accurately analyze data mining results such as these.

    4. in order to enable users of digitized historical newspapers to make more informed choices about what sort of research questions could, indeed, be answered by the available sources.

      and I see that this is indeed what they did! Preliminary data mining to create useful questions that can be answered by the data mining proper.

    5. represented rural or urban spaces, and whether there was enoughquantity and quality of thedata from both regions to undertake a meaningful comparison

      very important to actually understand the nature and content of your data before questions can be posed. I can imagine this is quite difficult; with a quarter of a million documents you somehow need to recognize some patterns BEFORE you even begin data mining and spatial analysis...there must be some sort of preliminary tools to scan the data before beginning the true work?

  2. Jul 2017
    1. . Torn pages, blurred text, or divergent typefaces can all lower recognition rates during the OCR translation process. (See figure 1.) A smudged word might cause the computer to translate “Texas” as “Toxas,” for instance.

      I did not realize these were the reasons OCR failed to accurately recognize the correct spellings of words! I can see how this happened in exercise this week

    2. My research builds on earlier spatial histories through its use of technology to analyze sources on a much larger scale.

      In a way, the author is constructing a view of Houston's "medium duree" (per Braudel) by looking at changing patterns along the geographic and economic history of the region.

    3. After all, a reader looking for a new pair of gloves may have been more interested in a back-page advertisement from a Dallas merchant than a front-page editorial by a Dallas mayor. Flattening the text helped me understand the multifaceted ways a newspaper produced space.

      I guess this answers my above query on the definition of significance! The author seems to view significance news as what is important to the reader, which can vary greatly. Analyzing a more egalitarian production of space was thus his aim.

    4. the "Dallas" in a front-page headline was given the same weight as the "Dallas" in a retail advertisement

      brings up the issue...is frequency the most important marker of "significance"? How does one measure "significance"?

    5. I shaped the project around the availability of digitized and machine-readable sources.

      I understand this is a necessity, but it also brings to mind the idea that perhaps there were better source materials for the author's project that weren't available! There might have been reasons why the Houston Daily Post mentioned a city a lot (a sponsor, background link to the editor or owner) and this bias wouldn't be clear without a comparison to another paper at the same time (which the author did not do because of availability of sources).

    6. Atlanta, Memphis, and Nashville were dwarfed by references to urban centers outside of the Sout

      I was a bit surprised by this; I would think these important hubs would be of interest to the readers and thus mentioned a fair bit! The author is suggesting that cities in Texas and large cities in the North dominated the pages while the American South was largely ignored.

    7. quantify how late nineteenth-century newspapers crafted a view of the world for their readers.

      Interesting to view the construct of space in a less literal way, not the physical construction of space but the "sense" of space created by a pattern of continuous affirmation of ideas and values.

    8. Instead of space serving as a neutral backdrop for the march of historical events, societies dynamically produce space over time.

      I really agree with this (wonderfully written) statement. You can see these divisions on a small scale (a bedroom, or a house layout) or on a grander scale (highways in Canada for example). We create and navigate spaces that often have arbitrary lines of distinction for reasons of comfort, necessity and organization.

    1. XML is a formal model that is based on an ordered hierarchy, or, in technical informatic terms, a tree. It consists of a root (which contains everything else), the components under the root, which contain their own subcomponents, etc. These components and subcomponents are called nodes. In the case of our book example, the root node is the book, it might contain a title-page node, followed by a table-of-contents node

      I love this extended metaphor for XML! As a complete novice to XML I can start to grasp what it is by thinking of it as an "ordered hierarchy" organized with comparable parts to a book.

    2. If a book were merely a stream of words, with none of the layout and formatting that lets us recognize its constituent parts, the reader would have much more difficulty determining where logical sequences of thoughts (instantiated as chapters or paragraphs, described by titles, etc.) begin and end

      This is very true, yet I also think that today's young readers are adapting to less-structured textual displays. Social media and the rise of "amateur" authors and online books have shown us that structured text (with chapters and clear arguments) aren't always necessary and a "stream" of text can be just as effective.

    1. This is a shame, as any contribution to Transcribe Bentham is beneficial to the project;

      This hardly seems true if the volunteer quoted could not even read Bentham's handwriting!

    2. Ninety-seven per cent of survey respondents had been educated to at least undergraduate level, and almost a quarter achieved a doctorate.

      I wonder about the types of undergraduates represented in this number! It was much higher than expected...I thought that many truly amateur laypeople would attempt to volunteer their time

    3. A community of volunteers engaged in the latter requires, Haythornthwaite suggests, qualitative recognition, feedback, and a peer support system.

      I like this distinction between the types of volunteers possible for a project! It depends on the nature of the work which is more preferable but the structured nature of a "community of volunteers" addresses issues of shoddy workmanship or inaccuracies.

    4. task is well facilitated, and the institution or project leaders are able to build up a cohort of willing volunteers. P

      I agree...I think that the planning and execution of crowdsourcing needs a coherent strategy to be viable. Not only willing volunteers but clear instructions so that the volunteers are producing useful and accurate work! I think that unorganized crowdsourcing projects could get very sloppy and end up taking up more time to sort and correct results. Enthusiasm is not enough !

    5. Crowdsourcing aims to raise the profile of academic research, by allowing volunteers to play a part in its generation and dissemination.

      I think this phrase "raise the profile of academic research" can be interpreted two ways...the first is that crowdsourcing could allow the general public to get involved in academic research in new and exciting ways that bring further attention to the research in question (media, etc.), but it could also mean raising the "quality" of academic research and I'm not sure if crowdsourcing always does this. I think that crowdsourcing can also lead to issues of lack of accountability for accuracy and sourcing, tainting results.

    1. Add your command to your history file, and lodge it in your repository.

      how do you do this? I have tried going back to last week's exercises and I am still just as lost

    2. Lodge a copy of this record in your repository.

      does this mean to save the excel document of the downloaded war graves into github or to somehow export the nano file from DHBox to github...?

    1. Make a new branch from your second last commit (don't use < or >).

      I tried using git checkout -b branchname <commit> without the < > and adding my own unique branchname and got errors. anyone else get this?</commit>

    2. Go ahead and make some more changes to your repository. Add some new files.

      Anyone know how to do this?

    3. Open up your readme.md file again

      how do you open the readme.md file? I can't seem to open it up no matter what command I type into DHbox..

    4. Google 'show hidden files and folders' for your operating system when that time comes.

      bookmarking this for when I want to change the default behaviour of the hidden .git file on DHBox

    1. Those of us whose work focuses on During’s “core humanities” are often understandably queasy about our fields’ development out of the projects of nationalism and cultural dominance,

      This reminds me of the discussions historians had about Harper's focus on Canada's military history being used to further a nationalistic agenda. The amount of money put in the remembrance of the war of 1812 and the renaming of the museum of civilization to the museum of history are two such examples.

    1. The value of our work is too wrapped up in the scarcity of sources themselves, rather than just the narratives that we weave with them.

      I think that until this mindset changes among historians (and other academic fields) open-source research will not take off. If we value the sources over how we interpret and synthesize them than we will continue to meet incentives to hoard academic research rather than open it up.

    1. Digital history should embrace the impermanence of the medium, use it to convey the changing nature of the past and of how we understand it.

      I think this is a noble idea but I still get anxious at the idea of embracing a changing medium where previous work in an out-of-date format may be lost to future digital historians. I know that the same could be said of paper sources, but with digital sources they are only lost because someone does not not where to look online, rather than the multitude of ways paper sources can be lost or destroyed.

    2. scrutiny from more potential fault-finders, it is hard to see its attraction.

      I love this quotation, it is both humourous and accurate! Academics don't want more pressure on their work (though critics and readers certainly would like it!)

    1. you are only a click away from scans of many of the declassified primary sources Suri used to develop his argument. This gives the reader a radically transparent view into the source material supporting the case Suri argues.

      I've actually seen this method used in online articles and on social media (facebook/tumblr) and I think it's becoming more standard for readers of our generation to expect links to proof (or "receipts" as it is often termed in social media). What I'm trying to say is that linking to sources directly is no longer seen as "radically transparent"

    1. They werent asking me for a peer review. They were asking me to associate my name with the journal, so they could point to me.

      I was astonished and disgusted at the behaviour Melissa Terras was exposed to with this "Frontier journal." This article exposed a whole other side of online academic publishing that I did not know existed!

    2. When 46% of the 500+ attendees to DH2015 audience were women?

      I love this author's honest critique of the DH academic community and the sexism present. I was surprised at the number of female attendees at DH2015 and admit that I would have assumed a much smaller proportion of females in the community.

    1. By not explicitly pointing out tools and approaches that embrace feminist values and diverse outlooks, we risk perpetuating incongruities, barriers, and biases in DH research

      This is a problem in any new frontier of academic research, and becomes more clear if Melissa Terras' blog post (footnote 1) is read! Academics must make substantive effort to open up their fields to include women and people of colour, not only out of human decency but also to tackle biases and break research barriers.

    1. Paper Machines, a digital toolkit designed to help scholars parse the massive amounts of paper involved in any comprehensive, international look at the over-documented twentieth century.

      I appreciate how honest the author is about the "over-documented twentieth century" which I understand as addressing the explosion of texts and sources created by an expanding literate population and bureaucracies.

    2. Are we content, as historians, to leave the ostensible solutions to those crises in the hands of our colleagues in other academic departments? Or do we want to try to write good, honest history that would shake citizens, policy-makers, and the powerful out of their complacency, history that will, in Simon Schama’s words, ‘keep people awake at night’?

      I think this discussion on the purpose of history is far from over and that many historian's today shy away from using history as activism. In Historical Theory we touched on these ideas but I think that historian's (and history students) still romanticize the idea that their research alone will inspire others, rather than making an effort to collaborate with other academics in emerging activism efforts.

    3. possibility of choosing and curating multiple futures itself seems to disappear

      I have definitely seen this way of viewing history in a teleological way in my economics courses. It is even worse when discussing the inevitability of market forces acting in the world's future economy; this lack of flexibility is what makes me thankful for learning about historian's skepticism.

    4. Scepticism towards universal rules of preferment is one vital tool for thinking about the past and the future.

      This is really important for historians and something that I think Carleton could focus on more for first- and second-year history students! Instead of discussing "bias" it is skepticism that should be emphasized!

    5. We are no longer in the age of information overload; we are in an era when new tools and sources are beginning to sketch in the immense stretches of time that were previously passed over in silence.

      I disagree...I almost feel like we are in a "information toolItalic** overload" where there are so many ways to compile and access data that only those well-versed in technology can access updates to big data mining.

    6. By crowd-sourcing the rejection of requests for Freedom of Information Acts, Connelly’s Declassification Engine was able to show the decades-long silencing of the archives

      This is really smart; I've learned quite a bit about Canada's access to information laws while working for the federal government and have seen what information falls through the cracks.

    7. Digitally structured reading means giving more time to counterfactuals and suppressed voices

      I think this is a bit idealist...perhaps the history of our time can be categorized in this way but digitization of the past can only mine data from sources that were created at the time, and these are often created by powerful actors and institutions.

    8. a challenge that much longue-durée work has yet to take up

      I definitely agree that this is the problem with Braudel's longue duree...how can data over such a long period of time be consolidated in theories, theses and relevant historical questions?

    9. index, the encyclopaedia, and the bibliography – came from the first era of information overload, when societies were feeling overwhelmed about their abilities to synthesise the past and peer into the future

      This is really fascinating to me and makes for hopeful for new, exciting ways we can revolutionize data organization moving forward. I wonder in particular how bibliographies will evolve in the future, as we are already experiencing the issue of deleted online sources as new webpages replace the older ones.

    1. Network visualisations

      I'd be curious if anyone had definition for this as I'm not quite following what this analysis is..

    1. Adding them just didn’t feel right because I don’t make an argument within them

      I still think that the author should have included some of these facts of "soft history" even though he does not think they contributed to his argument. His data on the changing meaning of the named job of "stationer," for example, is in my opinion important to present to the reader.

    1. This reminds me of our discussion of micro-histories in Historical Theory. We read "The Return of Martin Guerre," which I think is a particularly good example of how history can be viewed as narrative storytelling and that the author of the historical piece and their background can greatly influence history's presentation.

    1. associated with quantitative studies.

      I admit that this is very much how I imagine computational history so I look forward to seeing how digital history goes beyond numerical data sets!

    1. and fall of cultural ideas and phenomena through targeted keyword and phrase searches and their frequency over time

      I would love to use the some of this information in the final project if thats possible!

    2. Université de Montréal are reconstructing the European population of Quebec in the seventeenth and eighteenth centuries, drawing heavily on parish registers.

      As someone who is particularly interested in immigration history, this seems like a wonderful project! I think that this type of data mining could be useful in other cities and could be used to show the general population the changing nationalities that have lived there.

  3. www.themacroscope.org www.themacroscope.org
    1. I'm fascinated by the idea of quantifying historical facts and records into equally-weighted "data" that can be analysed. That is part of the reason I signed up for this course: to see how the analytical tools I have learned while studying history and economics can be turned on their head by computational methods. Does anyone have some examples of this translation from historical fact to numerical data?

    1. This is a good distinction from Fernard Braudel's distinction of timed period (longue, medium and short duree) for historical analysis that I studied last year.