    1. to hold it in thought as sacred, holy, consecrated to the highest of all functions, that of procreation. Recognize that, conserved and controlled, it becomes a source of energy to the individual.

      This is again a bourgeois feminist belief of the early 20th century. Sex for procreation, within marriage, was a positive force. This undercuts the belief that sex was universally seen as sinful in the late Victorian/early Edwardian era; however, it must not be mistaken for modern second-wave feminist arguments for more radical sexual freedom (though there were a small minority of radical feminists at the time who advocated for that, as well.)

    2. arouse in the reader a thrill through her own sexual organism that tends to increase its activity and derange its normal state

      This implies that women are naturally sexual creatures; this reflects the view of early feminists, which eventually became subsumed into the project of marriage.

    3. She can endow them with mental power by not frittering away her own powers of mind in foolish reading or careless methods of study. By her own self-respecting conduct she helps to give them the reverence for self which will insure their acting wisely.

      According to this text, the most important goal of female life is childbearing (passing on good genetics) and childrearing (passing on good behaviours). Female worth is equated with motherhood, both biological and as a practice.

    4. Are the family tendencies such that you would be willing to see them repeated in your children?

      This is a mild example of eugenicist thinking. The belief that people should make matches with the goal of passing on desirable traits to children, thus improving "the race."

    5. What is their worth? I do not mean in money,

      "Worth" here seems to be intended to refer to intrinsic/genetic qualities, but these qualities were often related to class - and, as such, often related to money.

    6. and see whether it is wiser to pass the border line, or to remain only friends.

      This puts the decision to get married and to have sex firmly in the hands of individual women - not in the hands of their families or of society. However, as we will see, women are expected to consider social and familial realities when making their choice.

    7. We all believe it very important that mothers should know how to direct and govern their children

      This emphasizes the importance of traditional gender roles (mother as nurturer, mother in the private sphere) in relation to marriage.

    8. criminality, which may disgrace your home through the paternal inheritance that you chose for them.

      The belief that criminal behaviour was genetically passed down was a common eugenicist claim.

    1. any visualization we create is imbued with the narrative and purpose we give it

      It's so important to keep this in mind: who are we helping/hurting in the way we arrange our data? Sometimes it's easy to forget that a picture has an underlying narrative as much as a text. Data, in whatever form, is ammunition; we need to aim carefully.

    1. “borrowing” the methodology becomes even more dangerous.

      I feel like I'm doing this with a lot of my project - right now, I'm dabbling in methodologies I don't understand. In part, it's because this is my first project and I'm busy ~failing productively~. It's important to remind myself, though, that I have a lot of hard work to do. I don't need to become an expert, but I should get familiar with the methodologies I choose to utilize in my final project so that I can speak confidently about my findings.

    1. At this point in the eighteenth century, a 254x254 matrix is what we call Bigge Data. I have an upcoming EDWARDx talk about it. You should come.

      This is my kind of historical argumentation. Can I write my final paper from the perspective of a disgruntled turn of the century Anglophone Quebecker?

    1. I’m suggesting that some texts use irony and dark humor for more extended periods than you suggest in that footnote—an assumption that can be tested by comparing human-annotated texts with the Syuzhet package.

      I definitely agree with Swafford here, but I like Jockers' determination to try something with sentiment analysis, even if the technology isn't perfect. Swafford has a point, though: Jockers is reaching pretty far with his something when he claims to be able to read literature for emotion out of algorithms designed to detect positive and negative wording.

    2. they were developed for analyzing modern documents like product reviews and tweets.

      !!!!!! very important. We historians are so used to analyzing documents for intended audience - we need to analyze our tools for it as well! If our tools are an extension of scholarly work, then tools can be just as biased as any scholar or group of scholars, and only by getting more opinions in the mix can improve the results.

    3. Since each word is scored in isolation, it can’t process modifiers. This means firstly that intensifiers have no effect, so that adding “very” or “extremely” won’t change the valence, and secondly (and more worryingly) that negations have no effect.

      Yeah... this is a problem. I'm reminded of Milligan's discussion of the magic black box. We trust algorithms when they are presented to us through a pleasing interface. Swafford has done well to go "under the hood," as it were; but we can't all rely on some other dedicated scholar to go under the hood of every algorithm we apply to our own work. Digital historians have a responsibility to be aware of the limitations of their own tools.

    4. I communicated privately with him about some of these issues last month, and I hope these problems will be addressed in the next version of the package.

      Neat to know how the whole DH community is connected both privately (through e-mail - okay, semi-privately, big brother is watching, etc etc) and publicly. Many forms of dialogue are available to digital scholars, and more accessible than ever before, as we've discussed in previous weeks. Thus, Swafford was able to engage with Jockers' project on multiple levels - as observer, as user, as private adviser, and as public critic. More roles means more perspectives means more fruitful dialogue about these tools and methods.

    1. the mean emotional valence

      What does "emotional valence" mean? He probably explains it in his earlier post, but a quick rundown of his methods would have made this blog post stronger & more accessible.

    2. Vonnegut draws the plot of Cinderella for us on his chalk board, and we can imagine a handful of similar plot shapes. He describes another plot and names it “man in hole,” and we can imagine a few similar stories. But our imaginations are limited.

      Fascinating - he seems to be using digital methods to test and build off of anecdotal scholarship, eg unreliable scholarship. It must be noted that close-reading (like what Vonnegut does to Cinderella) was the way to do literary studies when Vonnegut developed his theory. Vonnegut wasn't lacking data at the time; data on the scope of Jockers' corpus of digitized classic novels just did not exist. This suggests the great possibilities of digital methods. It also serves as a reminder not to dismiss past (or current) scholarship just because of its contemporary limitations. We must build on the past, not discard it.

    3. These core conflicts will be familiar to students of literature: such constructions were once taught to us under the headings of “man vs. man,” “man against nature,” “man vs. society,”

      Flashbacks to high school English class. These have always seemed reductive to me - and not just because they're so implicitly masculine! The fact is that most great plots contain elements of all sorts of conflicts. Even if there is an overarching conflict, calling it "man vs society" rarely conveys the essence of the conflict. It just gives it a convenient label.

    4. syuzhet (the organization of the narrative) as opposed to its fabula (raw elements of the story).

      I would be interested in reading Propp's work one day, because I fear this could create a false dichotomy between "details" and "big picture." One can't exist without the other, so historians and literary scholars shouldn't focus on only one.

    5. a video of Kurt Vonnegut describing plot in precisely these terms.

      As a writer and a huge Vonnegut fan, I'd heard of Vonnegut's plot structures many times before. They came to mind immediately when I read the intro to this post, actually! I had no idea there was a video of him explaining it, though, nor that Vonnegut imagined computers could analyze plot. Neat.

    6. the relationship between sentiment and plot shape in fiction

      my inner literature nerd is swooning. This has so much potential, and it's the kind of study that would be very difficult without some digital tools (to extract the raw data from entire novels and to visualize the resulting plot structures in a way that allows for easy comparison).

    1. In the long term, it may mean allying with like-minded historians, social scientists, statistical physicists, and complexity scientists to build a new framework of legitimacy that recognizes the forms of knowledge we produce which don’t always align with historiographic standards.

      This is fascinating! The obsession with "interdisciplinary approaches" to history may lead to a complete redefinition of historical legitimacy. I think that such expansion of the field could be good. After all, there was a time when "history" meant "stories written about great white men, usually at the head of governments". Social history expanded the scope of history to include stories of diverse peoples, cultures, and practices... could we expand the scope even further?

    1. the most legible result to historians will necessarily involve a narrative reconfiguration.

      I like that he acknowledges this. I'm most comfortable writing in a narrative form. If I had to write purely about methods to write "good" or "real" digital history, I wouldn't have an easy time. Though I want to challenge myself in this class to document my methods more thoroughly, I like that I can use these methods to inform and enhance my favourite methods of doing history.

    2. To convincingly make arguments from a historical data description, you must back it up using triangulation–approaching the problem from many angles. That triangulation may be computational, archival, archaeological,

      Love this! It really puts digital history into the context of the wider discipline. We're not just messing around with new toys. We're doing important work that builds on old work and can be built on in the future.

    3. The above boils down into two possible points of further research: deviations from expectation, or deviations from internal consistency.

      Aha! This is giving me an idea for my project. Funny how digital history is almost like relearning how to do history from the ground up. New methodologies make us question old methodologies, but sometimes we need to remember where we started.

    4. You have a big dataset and don’t know what to do with it


    5. Knowing that a community exists between history & philosophy of science is not particularly interesting; knowing why it exists, what it changes, or whether it is less tenuous than any other disciplinary borderland are more interesting and historiographically recognizable questions.

      This has definitely been a barrier to my interest in digital history: I prefer the "why" and "how" questions more than the raw facts. I like that this is being addressed (and that I could potentially add to the discussion).

    1. .+,.+,.+,

      This doesn't return anything for me when I run it in grep on my index.txt file, but other searches of the file do return things, and I have lines with more than 3 commas. What's going on?

    1. . Descriptive markup describes what the elements in a document mean, but not how they look, and CSS is intended to let the designer specify the rendering separately from the XML, so that meaning and appearance do not become conflated or confused. JavaScript JavaScript is a client-side programming language for manipulating, among other things, the appearance of web pages in the browser. Client-side means that JavaScript runs in the user’s browser, so that, for example, the user can change what is rendered in the

      Oh my God. I literally never understood what JavaScript was until I read this. I'm not sure yet how I'd use it, but I guess like everything else in this course I'll just have to play around with it one day.

    2. be used without reference to the Web or the Internet. For example, one could write XSLT to genera

      Aha! No wonder XML seems familiar to me.

    3. nd mark up in your documents is dictated by your research agenda, it is important to conduct your document analysis and develop your schema with your g

      This is a good point. I just see myself getting halfway into a text then realizing that my markup won't provide the results I need. Maximize planning, minimize hours wasted, etc.

    4. an index of place names for the document, or cause a map to appear when the reader mouses over a place name while reading, or make it possible to search for the string “London” when it refers to the place, b

      This blew. My. Freakin. Mind. The possibilities are endless - literally, because XML definitions can be anything, and then transformed into any kind of display.

    5. ning: a list is still a list, no matter how it is presented. In XML the person who creates the document

      Hence the TEI, so that we have some standards to build a communal collection of data that can be used by scholars. I wonder how widespread TEI is? Are there other common XML schemes used by scholars in the humanities? What about in other disciplines?

    6. The markup used in digital humanities projects is descriptive, which means that it describes what a textual subcomponent is. Descriptive markup differs from presentational markup, which describes what text looks like. For example, presentational markup might say that a sequence of words is rendered in italics, without any explanation of whether that’s because they’re a book title, a foreign phrase, something intended to be emphasized, etc. Descriptive markup also differs from procedural markup, which describes what to do with text (e.g., an instruction to a word processor to switch fonts in a particular place).

      This was illuminating for me. As a member of the Neopets/Proboards/Geocities generation, I learned basic presentational markup from a relatively young age. I had literally never considered the possibility of descriptive markup until this class. I think I love it so much because it combines my interest in textual analysis with the comforting tags I spent so many years perfecting in my messages and webpages. The tag-based structure of XML is almost second-nature to me.

    7. XML is a formal model designed to represent an ordered hierarchy, and to the extent that human documents are logically ordered and hierarchical, they can be formalized and represented easily as XML documents.

      I had never considered this before, but it makes a lot of sense. Most texts have underlying structure; making it visible with XML helps make us conscious, as researchers, of the structures underlying our texts. It could also help us understand better the connections between different texts - the Macroscope readings we did in the first week discussed doing so with Old Bailey records.

    1. Who does your data impact? (How?) What data might be worth trying to create? (Why?) How can you develop a plan to work with your data? How will you advocate for and about this data?

      Wonderful and concise list of questions. I can't believe I've never asked some of these (developing a plan, advocating for data) questions before. They aren't simple questions, but just considering them opens up a lot of potential avenues for growth and new research.

    2. I know that mess is often used as a word that has bad associations – people may tell you that something that’s messy has to be cleaned up before it’s worth anything. And I want to push back hard against that assumption

      This is something I want to keep in mind as a scholar. Reading over some of my old papers (yes - I am both a nerd and a narcissist) I'm shocked at how simplistic and reductive my conclusions used to be. I still have a long way to go, and part of my growth over these past three years at university has been learning to love the mess.

    3. Working with data can get messy really quickly. And that mess isn’t just something to be cleaned up – it’s people’s lives. People’s lives can be genuinely complicated in so many different ways, and figuring out how to handle that mess is a vital part of working with data. When people don’t consider the mess; or when they try to shove it out of the way so that it doesn’t complicate their analysis, then there’s a good chance that they’re not representing the reality that people are living.

      This perfectly captures one of my concerns with working with data. It sounds curmudgeonly, but sometimes hard data can feel static or stuffy - not as amenable to fluid interpretation as traditional sources. Of course, through this class, I've come to understand that many digital historians, like Morgan, appreciate the "mess" of digital data and handle data accordingly.

    1. If we do this for a variety of TLDs and GeoCities neighbourhoods, what patterns emerge? Could we use this as part of a finding aid to learn about a neighbourhood by ‘distantly reading’ the images?

      I want to see this done on screengrabs of Geocities sites by different 1990s fandoms or interest groups. I bet the websites of stamp collectors looked really different from websites of punk fans and Riot Grrls!

    1. . It is merely one way of formalising those processes that should be part and parcel of any analysis—namely making darn sure I know the precise individual or place a word is referring to, and the possible implication that this has for the source at large, even if it is only mentioned in passing.

      Better digital methods make for better scholarship, period. There are analog methods of recording this information, of course, but this class is all about harnessing the new opportunities afforded by good digital practice.

    2. at least three very important (argument-shaping) articles were only identified during routine tagging of my database backlog—a number I expect will rise as I endeavour to encode my remaining 400 transcriptions.

      Wow! Data management may not be sexy, but apparently it's useful. I think this article (and Beals' other articles on her workflow) are super useful to anyone doing history in a digital age - not just digital historians. She outlines clearly how she made digital tools work for her rather than against her.

    1. Acknowledging digitized historical texts as new editions is an important step, I would argue, to developing media-specific approaches to the digital that more effectively exploit its affordances; more responsibly represent the material, social, and economic circumstances of its production; and more carefully delineate with its limitations.

      This is a very succinct argument for looking at digital texts as editions. It covers many of the topics we've been discussing in class regarding the availability and reliability of digital sources. I think the most important part to remember here is that the circumstances under which a digital text is created can let a researcher know certain limitations of that text. It's important to know, for example, if an OCR-based search of the text will be effective given the quality of digitization; or if related texts are missing due to inequalities in funding.

    2. Over their three rounds of funding, then, Penn State sought to digitize newspapers from as many counties as possible, meaning they prioritized breadth of geographic coverage over digitizing the “most influential” newspapers in the state, which might have produced a corpus skewed in another way: toward Philadelphia over more rural areas in the state.

      Without any consensus on how to decide between digitizing "influential" and inclusive materials (or even how to define "influential" and "inclusive"), digital archives are going to be skewed for a long time yet. For example, Penn State may have decided to go for geographic coverage, but another state university might focus on one or two cities. Even raw numbers of newspapers digitized, then, cannot express the state of the digital archive.

    3. $393,650

      DIGITIZATION IS EXPENSIVE!!! This is an incredible number to keep in mind. No wonder so few newspapers are digitized, and that less are digitized well.

    4. The details gleaned from these files, however, are only one part of a full bibliographic account, which should also concern itself with the institutional, financial, social, and governmental structures that lead one historical textual object to be digitized, while another is not. In Ian Milligan’s study of newspapers cited in Canadian dissertations, he demonstrates quantitatively that overall citations of newspapers have increased in “the post-database period,” but also that those citations draw ever more disproportionately from those papers which have been digitized over those which have not

      This is exactly the point I was trying to make in my blog posts for module 2! I think it's so important to consider why and how texts get digitized, and to consider the ideological impact that digitization can have on our projects.

    5. I am not particularly bothered by the fact that OCR is an “automatic” process, while composing type is a “human” process. To maintain such a dichotomy we must both overestimate the autonomy of human compositors in print shops and underestimate the role of computer scientists in OCR. Both movable type and optical character recognition, along with a host of textual technologies in between, attempt to automate laborious aspects of textual production. Indeed, we can only speak of editions as such, whether printed or digital, within an industrialized framework.

      This makes a lot of sense to me. I've never thought about textual reproduction "within an industrialized framework" before, but it has heavy implications for text-based scholarship. One question that it raises in my mind: why do we care so much if machines are transcribing texts if we never really cared about the people who typeset the originals back in the 18th, 19th, 20th century? If human error is different from machine error (and I believe it is), why don't we talk more about the conditions that produced human errors in text? Cordell implies that asking questions about human textual production helps us better understand the implications of mechanical production.

    6. in which the digital archive can be only a transparent window into the “actual,” material objects of study

      This is an interesting point: what potential are we missing when we want only perfect OCR, and ignore the variety of things that can be done with digital sources?

    7. we require more robust methods for describing digital artifacts bibliographically:

      I've always had trouble, even as an undergrad student, citing digital books. When do I cite the original publisher, and where do I acknowledge the digital publisher? Should I date it to the day the website was uploaded, or the date of the print release? Should I note how the source was digitized (scan, transcription, other)? This is definitely a conversation that needs to be had.

    8. What is this text—this digital artifact I access in 2016?

      Hmm, interesting. It's easy to look at a poorly OCR'd text and dismiss it - just say "that's wrong, throw it out." But that isn't failing productively, that's just giving up. By asking these sorts of questions - what is an OCR'd document, really, and what can it be used for - we can wrangle with important questions about big data and digital methodology even if, character for character, the OCR "failed."

    1. What would be worse, however, would be to abandon our past rather than learn from it.


    2. Data management is an important part of any research project and should always, if possible, be done at the start of the project. This allows for consistency, repeatability, and reuse of your material in the future

      Yeah... still learning this. I'm grateful that the blogosphere has allowed for scholars in similar fields to share these kinds of thoughts. It's not something I've ever been taught in class - until now, of course - it's something that every scholar seems to have to learn the hard way. Collaborative online learning spaces that allow for posts like this can make this process less necessary, or at least less painful.

    3. It could also include complicated formatting and typographical information in a way that could be used to recreate the original presentation of the text but be easily disregarded when irrelevant

      This is fascinating! I love the idea of XML containing "layers" that can be added or removed as necessary, letting us manipulate data on-the-fly. It kind of reminds me of the iterative nature of github. Flexibility is a digital historian's best friend.

    4. determining the relevant keywords for any given work was a highly subjective process, as was the creation of new keywords to describe new, or at least newly noticed, themes and topics.  This again added a layer of inconsistency to my database. 

      This is the case with most long projects, I'd assume. I think a level of uncertainty in tagging is almost inevitable. Projects are going to evolve, and human understanding will always be adding new layers of understanding to previously-categorized texts. Beals gives some good examples of how digital methods can cut down on some of the mess.

    5. However, my naive lack of documented search parameters, and indeed the incompleteness of my transcriptions made their reuse in other contexts dubious.

      This sounds like... everything I've ever done. I can't tell you how many times I've pulled up an old file full of citations and notes and gone, "what?" It's interesting to note that not all digital history practices have to have some grand, open-access-world-changing goal. Best practices are useful for the individual scholar, too.

    1. $ sudo pip install twarc

      type $ pip install twarc (take out the sudo) to make this work.

    2. 755

      Why does this make the file okay to run?

    3. something concerning social and household history in early Ottawa

      This is a big part of my summer project on domestic servants at Rideau Hall - finding out what the conditions of servants were like in the average Ottawa home at the time. I'm still a bit lost on how to find data for that project, so hopefully these exercises will be useful to me!

    1. So here's the CND.xml, transformed into a csv: http://shawngraham.github.io/exercise/cnd.xml .

      What's the CND, and what is a .csv?

    2. Any claims, assertions or arguments made Now that you have highlighted these, you are going to put proper code around them.

      Small formatting error here!

    1. It’s time to start talking in the present tense.

      I feel like I need to make a sparkly animated .gif that says "FAILING PRODUCTIVELY" so I can link it every time a reading suggests doing just that. Seriously, though, this is so important for us students of digital history. We need to DO it before we can theorize about it. We need to know the limitations AND the abilities of our tools before we start planning what we can (or can't, or should, or shouldn't) build.

    2. By publishing it in the Journal of American History, with all of the limitations of a traditional print journal, I was trying to reach a different audience from the one who read my blog post on topic modeling and Martha Ballard. I wanted to show a broader swath of historians that digital history was more than simply using technology for the sake of technology.

      What kind of audience are we reaching with our blogs? I'm consciously writing my blog posts to my classmates, though I know Dr. Graham shares some of our stuff on Twitter. Should I be writing for a wider audience? How will I write about digital history after this class? Can I use these methodologies in other classes, and if so, how should I introduce them? We should all be considering these questions.

    1. Our guesses about search termsmay well project contemporary associations and occlude unfamiliar patternsof thought.

      The phrasing of this really drove home the dangers of full-text search for me. I tend to project quite a bit onto the texts I read; it's a habit of close reading, and something I know to be aware of and counteract. I haven't ever applied such a careful corrective measure to search terms.

    2. Instead, the algorithm has tosort them according to some measure of relevance. Relevance metrics areoften mathematically complex; researchers don’t generally know which met-ric they’re using

      Again, digital tech seems to blind us to confirmation bias, because if it's shiny and new and exciting, how can it be biased? Do we even know enough about some of these search algorithms to determine their internal biases?

    3. The search terms I have chosen encode a tacit hypothesis about the literarysignificance of a symbol, and I feel my hypothesis is confirmed when I getenough hits. I

      It's important to be aware of confirmation bias in all forms - digital (encoded into the algorithm itself) and human (in the ways we use these algorithms). There's bias in regular scholarship too, but this is a good point about how "big data" can be particularly convincing, tempting us into being less discerning as scholars.

    4. ‘‘search’’ is a deceptively modest name for a complextechnology that has come to play an evidentiary role in scholarship.

      I can't believe I'd never even considered this before. This has massive implications not only for "digital history" like we do in this class, but for ALL my academic work. I remember being told in first year that MacOdrum's Summon Search is an unreliable and inefficient search (it searches too many things with too few options for parameters). By second year, I abandoned that advice and Summon Searched without caring what I missed. I wound of spending a lot of time looking at articles that weren't quite what I needed. Given that digital history relies even more heavily on the results from such search technology, this kind of laziness could be fatal to a digital history project.

    1. Indeed, I made an android-only game out of it.

      I think this is cool. It's a good way to mobilize volunteer work, and I think the transcriptions that result would be as accurate as any (as long as they are checked for quality, which is another issue discussed well in this article about the Bentham Project) What do the rest of you think of "gamifying" historical contribution? Does it improve the amount or quality of sources being digitized? Does it impact how scholars should look at the resulting source?

    2. Other kinds of data - census data, for instance - were compelled: folks had to answer the questions, on pain of punishment. This is all to suggest that there is a moral dimension to what we do with big data in history.

      Fascinating. I would love to read more about this dilemma from a general philosophy (digital philosophy???) point of view. Of course, I've considered the ethics of history before, but the additional computational power of digital history (or, in the case of the census, the additional coercive power of the state) adds a new dimension to that question. Can we write good history with information the writers did not intend for us to know? Or is it the job of the historian to look beyond the text, and is computation just another tool to achieve that?

    3. Yahoo's closure of Geocities represented a terrible blow to social history.

      Moment of silence for all those animated .gifs.

    1. Thirty-eight million pages and millions of images were about to get lost forever as Yahoo! did not facilitate user export. Ian Milligan is working with web archives and he examines the digital ruins of GeoCities.

      I love this metaphor and Milligan expands on it brilliantly in the interview. It's good practice to contextualize digital history as a new form of history, rather than getting overwhelmed by how very big big data can be. It's not a one-to-one comparison, obviously - the methodologies of digital archaeology are obviously different from the methodologies of a physical dig site - but it helps emphasize the value of the data that was lost at the demise of Geocities. If we historians cry over the library of Alexandria, we should cry over Geocities all the more.

    1. an ongoing series

      YESSSSSSSSSSSS. (This is convincing me to finally pick up A Midwife's Tale.)

    2. I would not have thought that the words “informed” or “hear” would cluster so strongly into the DEATH topic. But they do, and not only that, they do so more strongly within that topic than the words dead, expired, or departed.

      It's important to remember that these topics are constructed by the historian. They are, like all historical narratives, just interpretation. Another historian may not have labelled that topic "DEATH." I would like to see the same text topic-modelled with the same parameters by two separate historians, and see how similar their interpretations are.

    3. we see Martha more than double her use of EMOTION words between 1803 and 1804. What exactly was going on in her life at this time? Quite a bit. Her husband was imprisoned for debt and her son was indicted by a grand jury for fraud, causing a cascade effect on Martha’s own life – all of which Ulrich describes as “the family tumults of 1804-1805.” (285)

      This is like a damn commercial for MALLET. It's important, of course, to remain cautious about directly attributing the rise in "emotional" words to certain events and using that as evidence that MALLET can solve all our text-mining problems... that being said, this (along with the "cold weather" experiment earlier) is convincing, and something I would like to try in a text for myself. Comparing digital data with non-digital observations seems like a good way to check scholarship.

    4. this pattern bolsters the argument made by Ulrich in A Midwife’s Tale, in which she points out that the first half of the diary was “written when her family’s productive power was at its height.” (285)

      Again: incredible how digital history can be used in tandem with, and in addition to, existing scholarship. This article really helped me imagine what exactly digital methods could accomplish within the established field. We can go beyond what we ever thought possible; everything can be explored in new ways.

    5. receivd calld left

      Does MALLET count words that share a root together? I remember reading in the Macroscope's discussion of Father Busa that there are text-mining programs that do this, but does MALLET do it by default?

    6. I don’t pretend to have a firm grasp on the inner statistical/computational plumbing of how MALLET produces these topics

      I like that he acknowledges this important limitation in his work. Accordingly, he uses his results more as "soft digital history" as described by Baker: to direct his thinking rather than as evidence in and of itself.

    7. computational linguistics

      Anyone know of any good resources to learn more about this topic? I'm fascinated by the fact that computer code is a language akin to human languages. Both communicate information, but with different strengths and for different reasons.

    8. less effective at recognizing that “the Author of all my Mercies” should be counted as well

      A good reminder for any of us considering using primarily text-mining and other related methodologies for our final projects (or other projects).

    9. “The problem is not that the diary is trivial but that it introduces more stories than can be easily recovered and absorbed.” (25)

      Incredible that digital history is allowing us to improve upon pieces as seminal as A Midwife's Tale. It's fascinating that Ulrich uses "recovered" and "absorbed," two terms that describe almost perfectly what a machine is able to do with the diary that a single person cannot.

    1. There is no value added in going with the traditional model that was built on paper journals, with having people whose full time job was to deal with the journal, promote the journal and print the journal, and deal with librarians. All that can now be done essentially for free on the internet.”

      Unfortunately, academic journals, while they may be lagging severely behind other sectors in the digital revolution, are not the only products that should be more accessible due to digital technology. There's no reason an e-book should cost as much as a paperback, or a digital download of an album the same as a physical CD.

    2. In contrast to the exorbitant prices for access, the majority of academic journals are produced, reviewed, and edited on a volunteer basis by academics who take part in the tasks for tenure and promotion.

      Damn, academia. This is a serious problem: academics (used to) have no choice but to support for-profit journals if they want to advance in their careers, regardless of how good the service is. And, even as a student, I know the service isn't great: journals as frustrating to access, highly restrictive, and I've found some of the databases and especially the search systems to be unreliable and unsatisfactory.

    1. Git will open your text editor and prompt you to add a message

      This did not occur when I tried to merge a branch to my master of a clone of Bethany's repo. Did I do something wrong?

    2. DHBox returns a list of every command that you've typed.

      does every command line (such as the one on my own computer) do this?

    3. don't type the $, but rather, type the wget etc

      is there a way to copy & paste instead of typing all of this into the cmd, or does cmd not accept pasted commands?

    4. One solution is for all of us to use the same computer. The CUNY Graduate Centre has created a digital-humanities focused virtual computer for just such an occasion.

      Computer science people: help my humanities brain understand how this is possible. Am I sending commands to a computer machine at Carleton, which is then doing all its computations and sending me back the results? Or is it actually entirely virtual, with no physical computer running all the time? What is DHBox???

    1. didn’t have a manuscript to ask reviewers to work with. A vestige, I suppose, of print workflows, my engagement of reviewers is always triggered by the arrival (even virtually) of the ‘complete manuscript.’

      I'm frustrated just reading this. It's interesting to me, as an undergraduate, to learn how much influence editors and established publishers have in the academic world. This kind of slow down, I imagine, must happen all the time.

    2. I examine particular moments of public dialog between the federal government and its citizens during the early twentieth century through commemorative stamp selection and production.

      Fascinating that she opened her study of public dialogues to a public dialogue (via WordPress comments). It goes to show that digital humanities methods are often revolutionary new ways of performing on a large scale and at a high speed the things that we've been doing for centuries (such as debating over stamps and history books).

    1. What kinds of public support for our institutions might we be able to generate if we were to argue that public projects that promote the love of reading (or the love of art, or the love of history) exist in consonance with the work that we do in the classroom, or in the writing we do for one another, and that we should therefore take participation in such projects seriously?

      This is an exciting idea. It also reminds me of the discussions of how women's labour has been systematically denied over the centuries, and only recently have scholars and activists made it clear that "women's work" has always been unpaid labour. It's interesting that women have also been coded as more emotional than men, therefore more loving, therefore more willing to do unpaid work. Emotion, it seems, is often used to justify unpaid labour.

    2. That challenge is typified, perhaps, in our adviserly reactions to the graduate school admissions essay draft in which our student recalls with heartfelt sincerity the love of reading they have carried with them since childhood.

      As someone who will be writing admissions essays in the fall: duly noted.

    3. I want to suggest that it might be good for us as scholars to reclaim and recover the emotions for our more socially committed purposes, but I have to acknowledge the difficulties involved in doing so.

      I think this is an important and difficult discussion that will have to be had soon. The validation of various expressions and experiences of emotion is a central part of the push for diversity and social justice. If academia continues to pretend it is a perfectly objective space, diverse voices will continue to be silenced by claims that they are "overreacting" or "irrational".

    4. But critical humility, as you might guess, is neither selected for nor encouraged in the profession, and it is certainly not cultivated in grad school. Quite the opposite, at least in my experience: everything in the environment of, for example, the seminar room makes flirting with being wrong unthinkable

      This may be why "productive fails" aren't encouraged in other classes. At least, not explicitly. It may also have to do with the fact that failed products are rarely handed in, and most classes (and academic journals) deal with the finished product, not the process. Thus a digital environment is a great place to start encouraging productive failure and critical humility.

    5. A proper valuation of public engagement in scholarly life, however, will require a systemic rethinking of the role that prestige plays in the academic reward system

      As someone currently considering a PhD and quaking at my future job prospects, this is a discussion I would like to be involved in. The rise of "alt-ac" as a category shows (as far as I understand it) that public engagement is seen as separate from "real," "prestigious" academics.

    6. Lisa Rhody has explored in a brilliant blog post on the applicability of improvisational comedy’s “rule of agreement” to academic life, adopting a mode of exchange that begins with yes rather than no

      I love "yes, and"! I did a lot of improv in high school, and this golden rule of improv is a memorable and appropriate way to describe how we should be approaching research. Whenever someone suggests something, you respond with "yes," and then you ADD something new to the scene. So an idea never gets completely shut down, it only gets continually improved over the course of the scene.

    7. owever much this internally-focused mode of critique has done to advance the field and its social commitments — and I will stipulate that it has done a lot — this form of engagement is too often illegible to the many readers around us, including students, parents, administrators, and policymakers.

      I never considered that self-criticism (within the field) was the root of this, but it makes sense. I've read some really whacky articles that made me question why I ever wanted to go to university. But I'm still here, continually upping the ante in my own papers, until I find myself writing about "conceptual bohemia as a female-coded body". There are other, clearer ways to express knowledge and data. Some scholars embrace them, yes, but sometimes the goal seems to be to get as complicated as possible. Perhaps digital history's new methodologies (such as visualizations and virtual reconstructions) may be one way to shift the focus of humanities scholars from critical complication to collaborative creation.

    8. studying literature or art or film might not be solely about the object itself, but instead about a way of engaging with the world: in the process one develops the ability to read and interpret what one sees and hears, the insight to understand the multiple layers of what is being communicated and why, and the capacity to put together for oneself an appropriate, thoughtful contribution

      This very eloquent passage sums up how I feel about the humanities. I love her emphasis on engagement and "appropriate, thoughtful contribution." I joke that I study English and History because I just want to read books all day. What I don't say is that reading books all day is my way of preparing myself to meet current and future challenges. Ideally, I am also preparing myself to pass on the experience I gain to future generations of readers. I study humanities for my own sake, yes, but also to discover what and how much I can give back to the world.

    9. I am primarily focused on the ways that we as professors and scholars communicate with a range of broader publics about our work.

      I find this discussion incredibly important. I considered, for a while, going into the Public History Masters at Carleton, because I want to make sure that as many people as possible have access to knowledge about the past. Then I realized that I didn't want to do Public History so much as I wanted to make History public. Just because historians are academics doesn't mean that their work can be divorced from public interest and criticism. I'm glad that digital historians are engaging with the newfound opportunities offered by the internet to publish findings more widely, and in more accessible formats (eg in semi-formal blog posts rather than dense academic articles - though both serve a purpose).

    1. making visible the processes by which history making takes place. 

      I just took the mandatory Historical Theory class, and I found it absolutely fascinating. I think it's important for not just writers of history but also readers of history to know that history is constructed by historians, and our current idea of "good history" is just another development in a long line of historical standards. This kind of work might help, for example, high school teachers explain to their students that history is actually written by human beings with their own ideas, goals, and shortcomings.

    2. the women’s liberation movement of the 1970s and 1980s. I follow in the footsteps of other women who sought to erode the distinction between public and private to reveal the politics underneath.

      I appreciate this acknowledgement of the philosophical roots of open-access academic work. As discussed briefly by the authors of The Historian's Macroscope, DH as a field needs to promote diversity and be aware of its current shortcomings on that front. By framing her online work in the context of the feminist movement, Moravec makes an important statement: DH is not incompatible with diversity - it is the result of it.

    1. Much of my paywalled work was written in public so drafts of it are available.

      This is a great system. I would be interested in this, if I am ever lucky enough to publish academic work. It helps bring scholarly work to the public.

    1. my sense is that we’re seeing the overrepresentation of Anglophones, recent English immigrants to Canada, who joined the CEF in the initial wave of optimism in 1915 and 1916, declining thereafter.

      What an interesting hypothesis. This serves as a good reminder that data is often best used when contextualized with other information - such as a knowledge of the social climate in Canada from 1915-1916.

    2. it’s not going to be perfect.

      Good point to remember with all data. Not only has the data been collected for a specific purpose, it has been collected within specific parameters that may not be ideal for a given project. That being said, I think these kinds of barriers push scholars to play with data in really creative ways, all while keeping them cognizant of the limitations of data.

    1. maintain our scholarly voice online

      I'm very interested in this. I think this is one thing that humanities programs still don't prepare their students for. Whether or not we're using big data or computational methods, the fact is that more and more of our work as scholars is going to move online.

    1. how many trips to Ottawa could be saved if we took this injunction to heart, and began sharing our research notes at a minimum

      I wonder how Carleton's history department feels about this? We tout the "capital advantage," after all.

    2. The value of our work is too wrapped up in the scarcity of sources themselves, rather than just the narratives that we weave with them.

      This is an interesting point. Archival research is a huge pain; it's part of the labour of the traditional historian. But perhaps some of the labour of the modern/digital historian ought to be reworking the system and making archival visits less necessary by depositing their research online. Yes, it's work. But it's necessary upkeep. If IT guys never took the time to update office computers past Windows 95, then they're going to have to fix a lot more busted computers. If historians never update archival material to digital formats, they're going to have to visit a lot more archives.

    1. My notes for both projects are available in an open-access wiki.

      He definitely aligns his work with his philosophy of open-source notes. I'm interested in exploring his notes on Henrietta Wood.

      EDIT: Aaaah, it seems to be down. ):

    1. Unlike Open Notebook Scientists, our motive for providing our data will have less to do with a desire to make our experiments reproducible, and more to do with a belief that historical arguments are on a fundamental level irreproducible. Each one is the product of a particular person or group of people at a particular time and place

      This is so fascinating. It reminds me of Drucker's argument that all "data" are "capta." Every piece of knowledge has its context, and historians must be particularly aware of these contexts. By sharing notebooks, historians allow others to see how their view of the past developed. It opens the historian as well as his work up to scrutiny, but it also enriches the field generally. It seems like a worthy exercise in professional- and, for many, personal - vulnerability.

    2. “Linking” items together on a website is not just a means of facilitating browsing; it is also a machine-readable way of doing what historians do all the time when we “link” sources, ideas, concepts and arguments together. The link, as Gardner Campbell has eloquently explained, is a powerful way to “symbolize ideas about relationship” and thus to symbolize the act of higher-order cognition itself.

      Has anyone else played The Wiki Game? You race another person to travel between two random Wikipedia pages just by clicking links. This part of the post reminded me of that game, and made me understand just how incredible it is that a game like that exists. We have so much knowledge, which we link together, which creates more knowledge, and so on forever and ever. No wonder big data has emerged along with the web. Even without user-generated content, the act of linking causes a deluge of data.

    3. The truth is that we often don’t realize the value of what we have until someone else sees it. By inviting others to see our work in progress, we also open new avenues of interpretation, uncover new linkages between things we would otherwise have persisted in seeing as unconnected

      What an incredible point. I think of all the articles linked in this module, this one convinced me the most of the need for open-source notebooks and made me excited about learning how to "branch" and "pull" and "push" project on GitHub.

    4. enable historians to easily share information about our research as it happens.

      I love this idea. I grew up living and breathing blogs and message boards. In my time at university, however, I've gotten comfortable enough with traditional academic practice that applying the interactive, open-source nature of the internet to scholarly work doesn't occur naturally to me. Even when I wrote a blog for my research project last summer, I didn't share the bulk of my research notes online.

  4. www.trevorowens.org www.trevorowens.org
    1. The Theory and Craft of Digital Preservation (forthcoming)

      He's put his publishing where his mouth is. This book will hopefully provide ideas on how to make the kind of online footnoting he discusses more feasible.

    1. as simple as clicking a link what do we think will turn up everyone else’s footnotes?

      This would also make the work of other historians (and history students, cough cough) much easier: footnotes lead to useful primary and secondary sources that can help guide a historian beginning a related project. If more sources were easily accessible online, more scholarship could be done.

    2. Peter Novik suggested that Abraham’s sloppiness was not a isolated case, but instead one of the only times a historians footnotes were so rigorously fact checked.

      I don't doubt it. Look, I try my best, but when I'm in crunch time for a paper, I always feel a momentary burning desire to give up on the footnotes. Then I talk to my classmates about it, and turns out that every single history student has that moment. I'm sure some professional historians have it, too; at least, they have a "good enough" moment, and their scholarship remains incomplete but unquestioned. That's not even mentioning misinterpretations and mistakes in the actual analysis, which pose far bigger problems, as Abraham's case shows.

    3. For quite sometime historians have been concerned with questions of ideology, arguments about which historical-isms are the best for a given task. Tom, suggests that new media tools (like text mining) challenge historians to consider methodological questions anew.

      This is what makes digital history so exciting - and one of the biggest challenges to me. I've been taught that history is mostly combing through text, and sometimes, occasionally, analyzing a photo or an artifact. Today, however, history can encompass so much more. I want to be open to new methodologies and. more importantly, to get excited about them.

    1. we need experts who are willing to talk about our data in aggregate over the longue durée, to examine and compare the data around us, to weed out what is irrelevant and contrived, and to explain why and how they do so. History can serve as the arbiter here: it can put neo-liberalism, creation, and the environment on the same page; it can help undergraduates to negotiate their way through political and economic ideologies to a sensitivity of the culture of argumentation of many experts and the claims upon which their data rest.

      This seems overly optimistic; longue duree, by taking such a wide view of the past, might in fact airbrush the distinctions and exceptions out of our understanding of the past, imposing its own sort of dogmatism. Moreover, history (though not necessarily GOOD history) has often been used to justify dogmatic thinking and totalitarian regimes, as well as simplistic methods of thinking about other people generally (think of how many people still think of Indigenous peoples today as the "noble savages" documented in Canadian history).

    2. subaltern voices through the integration of micro-archives within the digitised record of the longue durée form a new and vitally important frontier of scholarship. That immense labour, and the critical thinking behind it, deserves to be recognised and rewarded through specially curated publications, grants, and prizes aimed at scholars who address the institutional work of the longue-durée micro-archive.

      The gap between subaltern histories and longue-duree is not an easy divide to bridge. I wonder how subaltern historians view digital data. The author of this article is very optimistic about the possibilities of big data, but I bet historians who have worked with some of the most frustrating biases and oppressive realities of data collection might have other views of big data.

    3. Historians are the ideal reviewers of digital tools like Ngrams or Paper Machines, the critics who can tell where the data came from, which questions they can answer and which they cannot.

      Arguably not, as demonstrated by the quotation from The Historian's Macroscope in which Google Ngram's project leaders described how far behind the historians they worked with were in theorizing digital methods of handling big data. Even if philosophically historians could be the best arbiters of big data, practically, they are not.

    4. They are putting the data about inequality and policy and ecosystems on the same page, and reducing big noise to one causally complex story.62

      Again: it cannot be said that all historians do this, or at least do it well enough to set history above other disciplines in the development of theories for dealing with big data.

    5. Biologists deal with biology; economists with economics. But historians are almost always historians of something; they find themselves asking where the data came from – and wondering how good they are, even (or especially) if they came from another historian.

      I don't agree with the implication that other disciplines are inherently less self-critical than history. While it's true that history is usually seen as constant revision whereas sciences (even social sciences) are seen as fields built upon earlier work, there is a danger in imagining that all historians are inherently self-critical and not caught up in trends within their discipline. In fact, the very concept of the paradigm shift suggests that science does, at times, completely revolutionize its ideas.

    6. noticing institutional bias in the data, thinking about where data come from, comparing data of different kinds, resisting the powerful pull of received mythology, and understanding that there are different kinds of causes.

      Great overview of the important skills of a historian - I will keep these in mind as I go forward.

    7. We have been navigating the future by the numbers, but we may not have been paying sufficient attention to when the numbers come from. It is vital that an information society whose data come from different points in time has arbiters of information trained to work with time.

      This is fascinating and goes beyond digital history. It challenges fundamental parts of our interaction with the modern world, especially the informational world.

    8. Still more importantly, discussion of adaptation among academics is hardly a metric of political action in the outside world.

      This is a REALLY IMPORTANT point. Data does not mean what you want it to mean. That's not good scholarship. There are many degrees of separation between academic discourse and political action, and these must be considered and accounted for before drawing conclusions. This may seem obvious, but clearly sometimes the sparkle of big data blinds people to obvious gaps in arguments.

    9. the moral implications of forms of history that evolve to answer real-world and practical problems.

      I will keep this in mind as I continue my readings. It's refreshing that digital history, unlike some more traditional methodologies, emphasizes the presence of history.

    10. Rich information can help to illuminate the deliberate silences in the archive, shining the light onto parts of the government that some would rather the public not see. These are the Dark Archives, archives that do not just wait around for the researcher to visit, but which rather have to be built by reading what has been declassified or removed.

      I've done a lot of reading on archival theory during the past year or so. The "Dark Archive," while it sounds really dramatic, is actually a very exciting and important concept.

    11. the watchword of the fundable project must be extensibility

      Any ideas for what this could mean to us as undergraduates?

    12. As historians begin to look at longer and longer time-scales, quantitative data collected by governments over the centuries begin to offer important metrics for showing how the experiences of community and opportunity can change from one generation to the next.

      These sources might not always be relevant to historical inquiry - or they might be inadequate to answer specific questions. We must remain conscious that, although lots of new data is out there, not all of it is relevant to all projects.

    13. Traditional research, limited by the sheer breadth of the non- digitised archive and the time necessary to sort through it, becomes easily shackled to histories of institutions and actors in power

      I have definitely experienced this. For example, when I wanted to see how women have contributed the archival of private papers, I was forced to limit myself to the papers of "great men," and then further to a case study of the papers of Sir John A. Macdonald, in order to produce any coherent results. I wonder how digital methods could have expanded the scope of my research or even turned it in a new direction entirely.

    14. They may help us to decide the hierarchy of causality – which events mark watershed moments in their history, and which are merely part of a larger pattern.

      Fascinating. I love discussions of continuity vs revolution. I never considered how big data could be used to find previously unnoticed incidences of continuity and divergence. I can imagine that it could be useful in studies of, say, the French Revolution, or women's entry into the workplace. How much did things actually change?

    1. we do not have the same theoretical framework within which to understand how to read a space, a place, an object, or the inside of a pregnant cow

      That I can barely even imagine how to do ANY of this speaks to how far we have to go in this field.

    2. there is an assumption about the character of the 'truth' the data gives us access to

      Again, this is something I find deeply unsatisfying about digital history and "big data." I'm glad to see that professional digital humanists have already begun to consider this conflict between the subjectivity of historical narrative and the assumed (but unreal) "objectivity" of data.

    3. In most cases, we were studying 'text', and text alone - with its at least ambiguous relationship to either the mind of the author (whatever that is), and certainly an ambiguous relationship to the world the author inhabited.

      My own interests are very biased towards "text;" I expect to go through similar difficulties as I try to move beyond text to explore place, time, sound, sensation, etc. in my historical work.

    4. Projects like the Virtual St Paul's Cross, which allows you to ‘hear’ John Donne’s sermons from the 1620s, from different vantage points around the square, changes how we imagine them, and moves from ‘text’ to something much more complex, and powerful.

      This sounds INCREDIBLE. This is the one of the "reconstruction" DH projects that has captured my imagination. I'm mostly annotating this to explore it at a later date. I would love to do work like this.

    5. by simply thinking of the trials as ‘topics’; and I suspect you would find similar results.

      How exactly could this be done? My understanding so far is that topic modelling is mostly based on linguistic proximity and frequency of words. I would be interesting to see trials as "topics" rather than thematic groupings of words - or perhaps a "theme" would emerge from each trial?

    6. Ben Schmidt’s analysis of the dialogue in Mad Men, in which he compares the language deployed by the scriptwriters against the corpus of text published in that particular year drawn from Google books.

      What a brilliant use of digital history methods! I would love to do this for, say, Downton Abbey. It would, of course, have to take into account that people do not necessarily talk the same way writers write. Still, Schmidt seems to have drawn some reasonable conclusions from his data.

    1.  I hope that Deb Verhoeven’s truth to power / real talk was recorded and becomes available soon (now available).

      I LOVED Deb Verhoeven's talk. "I want 80% women, 20% blokes for the next 30 years." I also love her suggestion that digital historians who fit into a space of privilege mentor an aspiring digital historian who does not. Amplify new voices!

    1. The Digital Humanities—and by inclusion, Digital History—cannot be a playground for the privileged. Letting it become so will undo decades of important work done in the humanities to listen for and amplify the voices of those who are too often ignored.

      This is a driving factor of my interest in digital history. If we are on the forefront of a new field, then I want to make sure marginalized peoples are acknowledged and encouraged from the very beginning.

    2. Thankfully, in this digital age, our book is a living document. Rather than putting our hands up in frustration over errors and omissions, we can continuously publish updates, corrections, and new content as necessary.

      "Failing productively," I see! This is the best part of digital resources, in my view. I love that this shortcoming of the book has been acknowledged and, if not corrected, at least supplemented. The power of hyperlinks to provide external articles vetted by the authors of the Macroscope are also a great resource.

    1. the results of data processing were used in an inferential rather than explanatory way.

      I expect to be able to use digital history in a similar fashion.

    2. I made a spreadsheet. On that spreadsheet I recorded the title, date of publication, and publisher of every Isaac Cruikshank print I could get my hands on. I then recorded the places depicted in each print.

      This is a good example of "approachable" digital history - many people today understand the concept of a spreadsheet. Baker simply explains how he took this tool a step further using new methodologies.

    3. As you will see, ‘visible’ is probably a better word for this as there is nothing ‘hard’ about the Digital History on offer: it doesn’t tell any truths, it doesn’t solve any problems, it doesn’t sit outside of interpretation. Rather – much like any abstraction from primary sources – it does work that I found useful.

      I like the phrase "abstraction from primary sources". It reminds me that we all use some form of processing on the data we capture from primary sources - for example, organizing names of battle sites into a list, or making a table of trade voyages. In this case, the abstraction is on a larger scale and produced with new methods, but it can still be used for the same purpose as traditional abstractions.

    1. they helped orientate and shape my thinking rather than provide ‘results’ that I analysed, interpreted, and/or presented in the book.

      I have often come across sources and methods like this - aspects of a topic that don't make it into the final product, but help direct future research. I like that the blog format makes it possible to reveal, in well-organized blog posts, these aspects of research.

    1. How can we use historical knowledge in the present day, from informing policy decisions, to inspiring marginalized communities, or to simply tell entertaining stories?

      This is, as noted, an important question to ask in any practice of history. Its inclusion here, however, forces me to consider how I can use digital history specifically to answer this question. Can I use computational tools to, say, tell the story of a movement rather than of a single individual? Of a population or community rather than a household? I think digital history will force me to look at larger trends, which is a good change from the close reading I have gotten used to.

    1. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.”

      This really helps me contextualize how "big data" can impact my own work. I don't need to change my fields of interest; only change how I look at them now that more data and computational power are available to me.

    2. History is not merely a reconstructive exercise, but also a practice of narrative writing and creation.[5]

      This is critical to my experience writing history. I'm glad to see the issue raised here.

    1. Python

      I'm in the VERY EARLY stages of learning Python (like, a few stages above "Hello, world!") I'm having trouble imagining how the basic lines of code I am writing can eventually lead to complex programs, especially programs that help me do history. If anyone has any advice, resources, or reassurances, let me know.

    1. Historians need to begin to think computationally now so that our profession is ready to access this data in the next generation.

      I've thought before about future historians poring over my more unique Tumblr blog posts, but I'd never considered the limitations of digital storage as a historical database. I'm interested in the idea that we need to make changes to how we see digital data in the present in order to make history more accessible in the future.

    1. Yet we realize that they need to be critically studied, as they have come from divergent disciplines and domains.

      As an undergraduate, it's exciting to be introduced to these kinds of open-ended academic questions. I love the idea that just by doing digital history, I can help progress scholarship.

    2. Digital history, for one, sits closer to the public humanities than many of its counterparts.

      What does this mean? Is this a reference to the prevalence of online exhibits, etc, that overlap with public history? Is it a reference to the tendency of digital historians to be more public-facing than, say, digital literary scholars?

    3. A potential downside, however, was that computational history became associated with quantitative studies. This was not aided by some of the hyperbole that saw computational history as making more substantial “truth” claims, or the invocation of a “scientific method” of history.[12]

      As someone who is VERY MUCH AGAINST the scientific history claims, this is something I will be on the watch for in my own work.

    4. Fernand Braudel

      I was introduced to his work in my Historical Theory class, and I always found his concept of the longue duree very confusing. I can see how it fits in with digital history and big data, but as a philosophical approach to history, it goes over my head. If anyone is a Braudel expert, feel free to comment.

    5. Busa conceived of a series of cards, which would – he estimated – number thirteen million in total.[3] It would be his Index Thomisticus, a new way to understand the works of St. Thomas Aquinas.

      This is incredibly forward-thinking. I wonder how many projects thought impossible in the past could be implemented using modern technology, or technology still to come? How many abandoned ideas can be resurrected?

    1. ORBIS: The Stanford Geospatial Network Model of the Roman World

      This is SO COOL. As much as I like textual analysis, I would love to be able to work on a project like this someday. Go play with this one.

    2. But they didn’t seem to have a good sense of how to yield quantitative data to answer questions,

      Same, Google-reject historians. Same.

    3. Even digitized newspapers, which on the face of it would seem to be excellent resources, are not without serious issues at the level of the OCR

      This will be good to keep in mind when we all work on the "hastily-scanned" newspapers for our final project.

    4. One often unspoken tenet of digital history is that very simple methods can produce incredibly compelling results, and the Google Ngrams tool exemplifies this idea.[14]

      Also important for our final project! I have no idea yet how I want to work with the newspaper records; I hope we can all give each other ideas as we work through the course content.

    5. through comparing differences in documents (using Normalized Compression Distance, or the standard tools that compress files on your computer) one can get the database to suggest trials that are structurally similar to the one a user is currently viewing.

      I would like to know more about this - how does the computer define "structurally similar"? How reliable are the results? How can they be used? I assume that after finding "structurally similar" trials, a historian would have to do some analysis in order to determine whether the trials were similar in other significant regards. Does this undermine the value of the technology? Is it only useful as a search-and-find tool?

  5. www.themacroscope.org www.themacroscope.org
    1. (“the great unread”),[1]

      As a double major in English and History (with an eye on an English masters), this aspect of digital humanities is particularly fascinating to me. I look forward to using the techniques I learn here to practice some "distant reading" (http://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distant-reading.html)

    1. through expansion: the ability to extract complex knowledge from the smallest crumbs of evidence that history has left behind.

      This is how I have become used to doing history. I'm so used to it, in fact, that the idea of looking at history from any other perspective is immediately confusing to me. That's why it's so fascinating to consider the difference between a "microscope" and a "macroscope" in regards to history: the value of the macroscope is evident, but I'm curious as to how exactly I can apply it to historical work. (I suppose this class is the place to find out.)

    1. We have to be cognizant of the sociology of digital production, and the ways that -for instance- the heavily white male demographic that encodes the tools and platforms make hidden value judgements about what is important.

      I can already tell - as someone who only just barely understands basic statistics and "big data" - that this aspect of digital history will be one of the most fascinating to me (unless, of course, something else catches my eye during the course).

    2. Empathy - they write with care and consideration for these lives in the past. That is to say, they recognize the ‘why’ of what happens without retrojecting current mores onto actors in the past

      I'm very glad to see this included, and so high up on the list. This has always been my goal in writing non-digital history; I hope I can explore it further in the digital realm.