1,068 Matching Annotations
  1. Aug 2017
    1. You can see here that Mr Appleton and Mr John Adams were connected through both being a member of one group, while Mr John Adams and Mr Samuel Adams shared memberships in two of our seven groups. Mr Ash, meanwhile, was not connected through organization membership to any of the first four men on our list. The rest of the table stretches out in both directions.

      This would be interesting to visualize using Palladio.

    2. And, of course, we can also do that for the links between the people, using our 254x254 “Person by Person” table. Here is what that looks like.

      Although this visualization is much more complex, I like it more. For me, it is great to see the individual outliers and how they relate to the majority.

    3. Notice again, I beg you, what we did there. We did not start with a “social networke” as you might ordinarily think of it, where individuals are connected to other individuals. We started with a list of memberships in various organizations. But now suddenly we do have a social networke of individuals, where a tie is defined by co-membership in an organization. This is a powerful trick.

      Again, what an interesting way to simply explain a complex topic! It's so interesting how you can create a visualization of relationships between variables this way.

    4. (Harvard, you may recall, is what passes for a university in the Colonies. No matter.)

      I have nothing very important to write here, but this was hilarious.

    5. Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not.

      I really like how metadata is described here. Explaining the meaning in the context of the 18th century is a perfect ELI5 example! **(explain like i'm five, a subreddit devoted to simple explanations of complex issues).

    6. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relational manner.

      I love this. Showing how these methods might have been used at such crucial moments in history is both extremely entertaining and educational. Well chosen professor.

    7. Urann, Proctor, and Barber

      $ sed 's/Urann, Proctor, and Barber/Moony, Wormtail, Padfoot and Prongs/g' thisarticle.txt

    8. Rather than relying on tables, we can make a picture of the relationship between the groups, using the number of shared members as an index of the strength of the link between the seditious groups.

      Does anyone know of a program that would be able to do this?

    9. (Harvard, you may recall, is what passes for a university in the Colonies. No matter.)

      This already my favourite article we've read in this class. I love it!

    10. basic way to represent information about links between people and some other kind of thing,

      Simply put - but this really is the essence of what we are doing. Besides, this style of writing is intriguing and certainly a change of pace.

    1. Stylized graphical effects can be just as distracting as chartjunk.

      This is true. People can shape visual representations to encourage readers to interpret them a certain way, which can introduce bias.

    2. sparklines

      If I remember correctly, there seemed to be something similar to this in voyant, in the "Document Terms" tool.

    3. The right visualization can replace pages of text with a single graph and still convey the same amount of information.

      So very true. Afterall, there is a quote, "a picture is worth a thousand words." I personally find visual data to be more engaging. Including charts, graphs, maps, etc. may attract new readers, and make it easier for all readers to sustain their attention (by breaking up long explanations/blocks of text)

    4. This particular display is showing the most likely route from Rome to Constantinople under a certain set of conditions,

      I'm curious to know what kinds of physical documents they used to estimate or establish these patterns. This is something I'm going to read into.

    5. Google’s web browser, Chrome, searching for a word on a webpage highlights the scroll bar on the right-hand side such that it is easy to see the distribution of that word use across the page.

      Wow. I had to see it for myself, because somehow I never noticed this. This is really cool that they included a simple data-analysis feature into mainstream browsing.

    6. accidentally typed “1909” rather than “1990” for one of the books.

      Simple accidents can result in huge mistakes. It's important to remember how precise and careful digital historians have to be in their work, now more than ever.

    7. Visualizations can also lie, confuse, or otherwise misrepresent if used poorly.

      This point is very important. It makes think about how misrepresented statistics are often used by different political parties or companies to make claims on facts that aren't necessarily true. With great power comes great responsibility, right?

    1. DEATH

      Death is interesting. Obituaries can show the relationships between people between places. What can I do with this?

    2. MARRIED

      Another way to show relationships between people. Tracking the number of unions in different religious denominations?

    1. The ease of use for LDA with basic settings means humanists are too likely to take its results as 'magic', rather than interpreting it as the output of one clustering technique among many.

      This is why I see the importance in doing a bit of research on a tool you plan to use: one can develop a better appreciation for what goes into creating it, and what comes out of it.

    2. doesn't map perfectly onto the expectations we have for the topics.

      This is a good point; I experienced this firsthand when I ran the Shawville Equity through the topic modelling tool in exercise 2. I expected to see a theme about crime, but was surprised by topics that seemed to have a war theme, or a political theme.

    3. I'm not sure that we should too enthusiastic about interpreting results from machine learning which we can only barely steer.

      Fair point. It's an interesting technology, but perhaps we had better get a firm understanding of it before we jump right in.

    1. “Interdisciplinary Studies” and “Historical Studies” (☹),

      I truly believe that interdisciplinary studies is the way of the future. With many STEM (Science, Technology, Engineering, Math) buildings and clubs being built and created, and even some of them being renamed to STEAM (Science, Technology, Engineering, Arts, Math), I think that class discussions and learning in general will become more profound than only learning what you need to in order to graduate. Learning from people who have different perspectives and ideas is how people get inspired and create things that have never been thought of before. It also allows them the opportunity to collaborate with people that they would perhaps never have met before.

      I remember when I was studying for my Film Studies midterm in the Engineering building at Uottawa, and someone sat down and started asking me what class I was studying for and if she could join. When I said Film Studies, she asked why I was in the engineering building and before I could answer, she walked away. I found this really odd and all of a sudden felt like because I was in Arts, I was not welcomed in this building, which is something that I hope will change soon.

    1. ibrary(rJava)

      I was wondering if library(rJava) would be library('rJava'). They both work in R studio though.

    2. Here’s a stoplist you can use http://bridge.library.wisc.edu/jockersStopList.txt

      This link redirects and I couldn't find the original text file by googling, but the full stoplist is available on his website here http://www.matthewjockers.net/macroanalysisbook/expanded-stopwords-list/ I just copied and pasted into my text editor and saved as a .txt file

    1. These words, which have very high frequency in most corpora, are extremely valuable for the historian because they allow us to view complex relationships within corpora that looking only at nouns and verbs does not.

      as in reading the words as a sentence (with all the necessary parts to understanding)?

    2. Machine reading of sources provides two advantages for the historian  1.  Machines can deal with far larger volume of source material than the human brain can, anything from hundred to hundreds of thousands  2.  Machines can find patterns and relationships that the human brain cannot.

      Here is a good summary of one of this classes learning goals. To understand this and be able to do this!

    1. isn’t because things are getting easier, but rather that I’m actually getting better at this stuff.  


    1. ASSESSING DIGITIZATION QUALITY:This interactive visualization plots a quantitative survey of our newspaper corpus. Users of this interface can plot the quantity of information by geography and time periods, using both to survey the amount of information available for any given time and place. This is available both at the macro-level (that is, Texas as a region) and the micro-level (by diving into the quantity and quality of individual newspaper titles), and can be tailored to any date range covered by the corpus.

      How can it be tailored to be at a macro for users and still cover all date ranges? With out dividing out possible prospects.

    2. “Mapping Texts”is a collaborative projectbetween the University of North Texas and Stanford University whose goal has been to develop a series of experimental new models for combining the possibilities of text-mining and geospatial analysis in order to enable researchers to develop better quantitative and qualitative methods for finding and analyzing meaningful language patterns embedded within massive collections of historical newspapers

      This actually just gave me an idea for a final project! If anyone else is curious, Carleton has access to a digitized version of the London Times newspaper, dating back to the 18th century. Finding and mapping specific trends from that data set would be amazing!

    3. If the user wanted to burrow as far down as a single publication, they can do that as well

      traditional essays are sort of the opposite. They start with specific information and research and move out to broader arguments. interesting how they connect and compliment each other.

    4. digital environment, assessing the quantity of information available also necessitates assessing the quality of the digitization process.

      Not sure exactly how this would look, but including these discussions in our papers seems important after all the previous readings we have done.

    5. We soon realized, however, that before we could begin to answer such a question w

      Lots of digital history seems to be like this ... start something, switch focus, go back to it and try again with a different tool or lens.

    6. The age of abundance, it turns out, can simply overwhelm researchers, as the sheer volume of available digitized historical newspapers is beginning to do

      I find this paradox interesting - that an abundance of data could actually ultimately result in less accurate or meaningful information.

    7. The broader purpose behind this effort has been to help scholars develop new toolsfor coping effectively with the growing challenge of doing research in the age of abundance, as the rapid pace of mass digitization of historical sources continues to pick up speed. Historical records of all kinds are becoming increasingly available in electronic forms, and there may be no set of records becoming available in larger quantities than digitized historical newspapers.

      This section is a nice summary of a lot of the issues that we've discussed over the past couple weeks, particularly in module 2 and 3

    1. .graphml file

      Does anyone know where the option of this is? Mine is only offering to save it under SVG, PDF & PNG?

    1. “borrowing” the methodology becomes even more dangerous.

      I feel like I'm doing this with a lot of my project - right now, I'm dabbling in methodologies I don't understand. In part, it's because this is my first project and I'm busy ~failing productively~. It's important to remind myself, though, that I have a lot of hard work to do. I don't need to become an expert, but I should get familiar with the methodologies I choose to utilize in my final project so that I can speak confidently about my findings.

    1. git pull -u origin master

      When I did this a second time (git pull origin master), it said Everything up-to-date.

    2. git pull -u origin master

      For me (Windows 10) git pull -u origin master gave me error error: unknown switch `u'. git pull origin master worked.

    1. Paste the URL to the csv of the CND database: https://raw.githubusercontent.com/shawngraham/exercise/gh-pages/CND.csv .

      I think this is a broken link as well, I'm getting a consistent 404 error.

    2. Networks Demystified.

      Nothing here

    3. Basic Text Mining in R

      I think this link is broken

    1. emotion implied though metaphors or imagery patterns, or use satire and sarcasm

      Perhaps this wouldn't be a big problem for the analysis of newspapers, but for other things, it would most certainly matter.

    1. dirty OCR illuminates the priorities, infrastructure, and economics of the academy in the late 20th and early 21st centuries.

      I never thought about it this way- maybe one day in history another student will be laughing at how far behind our "technological advances" were

    2. Our primary perspective on the digitized text thus far has been that of the textual critic who is entirely “concerned with…the reconstruction of the author’s original text

      I was under the impression that this should be a primary concern? if we are digitizing history should it not be accurate?

    3. The consequences of “errorful”

      A reminder that while technology is advancing it often requires a human eye to monitor it

    1. We'll use regex to massage the information into a format that we can then use to generate a social network of letter writers over time.

      is the address given the one of the destination or the of the sender?

    1. a  vast  field  of  haystacks  within  which  they  must  locate  the  needles  -­  and  presumably,  use them  to  knit  together  a  valid  historical  interpretation.

      Love this explanation! Historians who face Big data struggle more, but emerge with a unique interpretation.

    1. As usual Heather Froelich saved my ass here by pointing out I might want to reconsider methodology.

      Trial and Error!!!

    2. The uses of female don't exhibit these stark divides, although I found it fascinating that prior to the Civil War, neither  female citizen nor female suffrage appeared in the male corpus.

      This is super neat. It really sums up the type of patterns we as historians are to be looking for, to build our knowledge of history at those times even greater then before. Studying of linguistics has many perks!

  2. Jul 2017
    1. Digital methods are not any more or less valid than traditional approaches, but they do provide a different entry point into the historical archive.

      This is very true -- both are valid approaches that can reach the legitimate conclusions; the difference is the amount of time and effort spent. Would we rather spend less time digitizing in order to have a large archive, even though documents may be riddled by OCR errors? Or would it be better to have a smaller archive, but with complete and human-verified documents?

    2. Digitized newspapers are inherently messy sources.

      There has to be a human component to the digitization process. If this were to be left to computers, the result could drastically differ from the original. Historians are sort of the "quality control agents" in a way.

    3. Online databases often use OCR to enable users to search their collections, but few provide access to the "raw material" of its underlying machine-readable text needed for large-scale text mining. Private, for-profit content providers are particularly hesitant to provide individual researchers with that degree of access to their material.

      Another example denoting how history can sometimes be proprietary. Just like in Module 1, this shows how people aren't inclined to share information if it will adversely affect oneself and/or benefit others.

    4. distant reading often necessitates computers that can "read" massive quantities of text in a matter of seconds.

      I know that artificial intelligence has tremendous capabilities, but I feel as though data and its meaning could potentially be lost or misinterpreted, since computers don't have the same type or extent of judgement as a human being.

    5. For perspective, a researcher poring over the newspapers nonstop for eight hours a day, five days a week, would need four years to finish reading them.

      This is why digital tools have become increasingly useful. They may not be able to research for you, but they can drastically help with tackling tasks of this scale.

    6. societies dynamically produce space over time.

      I would argue that we don't make space, but instead make better use of the space that we do have.

    7. For most of the period the Telegraph and Texas Register was largely incapable of printing Anglo place-names in western Texas

      This is an excellent opportunity for digital history. A program that can collect data for the same geographical site even though it has been called different names in the past.

    8. If historians insist on perfect data, however, we risk ignoring huge swathes of the digital archive.

      How often do these sources get ignored? How many sources are past over for this reason? How much information is lost or ignored?

    9. In the case of the Telegraph and Texas Register, 9.4% of its digitized words did not appear in a dictionary, while the Houston Daily Post had a substantially higher percentage (19.7%) of unrecognizable words

      Although I sure historians and other academics who use these source can understand the sources even if a few words are slightly altered, how often does this happen in general? Has this affect the work of academics? Altered the meaning of documents they used for their research. Not necessarily lost in translation but affected by technological change.

    10. Digitized newspapers are inherently messy sources. They often resemble a jumbled bag of mistake-ridden words as much as neatly segmented columns of text. If historians insist on perfect data, however, we risk ignoring huge swathes of the digital archive. One goal of my project was to find a way to draw meaning from messy text.

      Relevant with correlating to what we are doing this week and this course in collecting data and organizing it. Find it interesting that even old print newspaper could be so full of clutter and be unorganized such as the "click bait and junk" articles that pollute the internet today.

    11. The grid system

      I recently took a flight from Ottawa to Philadelphia and I had an opportunity to get a birds eye view of the grid layout of both cities. It can be easy to ignore how much work goes into the organization of facilities/services.

    12. . Torn pages, blurred text, or divergent typefaces can all lower recognition rates during the OCR translation process. (See figure 1.) A smudged word might cause the computer to translate “Texas” as “Toxas,” for instance.

      I did not realize these were the reasons OCR failed to accurately recognize the correct spellings of words! I can see how this happened in exercise this week

    13. My research builds on earlier spatial histories through its use of technology to analyze sources on a much larger scale.

      In a way, the author is constructing a view of Houston's "medium duree" (per Braudel) by looking at changing patterns along the geographic and economic history of the region.

    14. After all, a reader looking for a new pair of gloves may have been more interested in a back-page advertisement from a Dallas merchant than a front-page editorial by a Dallas mayor. Flattening the text helped me understand the multifaceted ways a newspaper produced space.

      I guess this answers my above query on the definition of significance! The author seems to view significance news as what is important to the reader, which can vary greatly. Analyzing a more egalitarian production of space was thus his aim.

    15. the "Dallas" in a front-page headline was given the same weight as the "Dallas" in a retail advertisement

      brings up the issue...is frequency the most important marker of "significance"? How does one measure "significance"?

    16. I shaped the project around the availability of digitized and machine-readable sources.

      I understand this is a necessity, but it also brings to mind the idea that perhaps there were better source materials for the author's project that weren't available! There might have been reasons why the Houston Daily Post mentioned a city a lot (a sponsor, background link to the editor or owner) and this bias wouldn't be clear without a comparison to another paper at the same time (which the author did not do because of availability of sources).

    17. Atlanta, Memphis, and Nashville were dwarfed by references to urban centers outside of the Sout

      I was a bit surprised by this; I would think these important hubs would be of interest to the readers and thus mentioned a fair bit! The author is suggesting that cities in Texas and large cities in the North dominated the pages while the American South was largely ignored.

    18. quantify how late nineteenth-century newspapers crafted a view of the world for their readers.

      Interesting to view the construct of space in a less literal way, not the physical construction of space but the "sense" of space created by a pattern of continuous affirmation of ideas and values.

    19. Instead of space serving as a neutral backdrop for the march of historical events, societies dynamically produce space over time.

      I really agree with this (wonderfully written) statement. You can see these divisions on a small scale (a bedroom, or a house layout) or on a grander scale (highways in Canada for example). We create and navigate spaces that often have arbitrary lines of distinction for reasons of comfort, necessity and organization.

    20. Emphasizing technology, however, risks overshadowing an even more important commonality: collaboration.

      Collaborations seems to me to become easier and more feasible with technology. People can work remotely in teams from anywhere with an internet connection. For example HIST3814 has us responding to others and learning from each other remotely.

    21. mbuing different neighborhoods with different meanings

      I was at Pure Kitchen on Elgin Street and they have a piece of writing on the wall that lists different areas of Ottawa and some of the characteristics of the people who live in them. I find it very interesting that where you live in a city already says a lot about a person. My parents are "Glebites" and they rarely use a car, instead bike everywhere, only purchase organic food, and their dog is their life! I live in Orleans however... need I say more?

    22. Emphasizing technology, however, risks overshadowing an even more important commonality: collaboration.

      collaboration between researchers and technology?

    23. Finally, the overwhelming presence of Texas places reveals the dominance of regional space. Galveston, Dallas, Fort Worth, Waco, and San Antonio may have occupied a relatively lowly position in the nation’s urban hierarchy, but they sprawled across the Houston Daily Post’s imagined geography.

      I don't feel like this is very groundbreaking information. Wouldn't it be obvious that cities that are regional to the newspaper would be more likely to have been mentioned?

    24. The grid system

      The grid systems out here in country side of manitoba are square miles, set with correction lines. Getting lost is fairly hard, but when you do loose your sense of direction out on the gravel it will toss your senses off, the world feels HUGE all of a sudden, until you hit a main highway that is. At least here in manitoba the dirt roads are numbered, not like in sask.

    25. The human reader simply cannot identify spatial patterns at that level of mundaneness, granularity, and fragmentation. Fortunately, computers are quite good at this kind of reading.

      Something that I never really thought about in writing. Can you imagine if a computer can even identify certain authors just based on historical data collected on their style of writing collected. This would be beneficial to collecting data if for instance you discover a piece of writing without the authors name on it.

    26. The second visualization compares the Telegraph and Texas Register and Houston Daily Post’s imagined geographies from the same vantage point in space but at different points in time

      I don't like how they use different points in time. It's hard to compare two different Newspapers, at two different times. It kinda makes the data harder to constrast because the information of one isn't as consistent. I would of used the same newspaper for two different times, or in the same time with two different times. I just feel by using the same newspaper, the author would make his point stronger that they were targeting certain geography.

    27. a specific sense of place.

      An example of this could be central park. You can be in the middle of a city, but yet feel close to nature.

    28. The grid system

      These grid systems are so interesting to study. So many planners approach this in different ways and it all vary in different cities. Some can be divided simply by squares, rectangles, circles, or even different set of patterns all put together.

    29. I shaped the project around the availability of digitized and machine-readable sources. Accessibility can be a major impediment to digital analysis.

      I can understand that, you do not want to start a project where you will not be able to find data to support it. By doing this, you know from the start that you have access to all the data you need. I guess this can also be negative point because if everyone did this, there will be less incline to produce and digitalize data themselves... right?

    30. Their daily or weekly print cycles also allow historians to track temporal changes in much finer detail than do other sources such as maps or novels.

      That is true! In terms of producing data, news papers are reliable, consistent and easily accessible.

    31. . Finally, the overwhelming presence of Texas places reveals the dominance of regional space. Galveston, Dallas, Fort Worth, Waco, and San Antonio may have occupied a relatively lowly position in the nation’s urban hierarchy, but they sprawled across the Houston Daily Post’s imagined geography.

      I can understand this, these cities could have been key for the State of Texas whoever less important in the larger scale of the country.

    1. In the Data Laboratory, select “Import Spreadsheet.” Press the ellipsis “...” and locate the CSV you created. Make sure that the Separator is listed as “Comma” and the “As table” is listed as “Edges table.” Press “Next,” then “Finish.”

      It won't let me import my file as it says "The file can't have repeated column names". Did this happen to anyone else?

    1. Maps are not necessarily always the most appropriate visualizations for the job, but when they are used well, they can be extremely informative.

      For the visualization above, even being informed what a Cartogram was, helped a lot in explaining how to read it.

    2. The right visualization can replace pages of text with a single graph and still convey the same amount of information.

      If using simplified visualizations, wouldn't this cause some of the visualizations to lack certain data? In the event that it does require a full page of text to explain, maybe a solution, would be to keep both?

    3. Experts in the area have argued that the most powerful visualizations are static images with clear legends and a clear point, although that may be changing with increasingly powerful interactive displays which give users impressive amounts of control over the data.

      I think a lot of it depends what you are presenting. If we were in class using mathematics, than yes, I would agree because the last thing I want is for it to be displayed as a Prezi Powerpoint. Meanwhile, if you were learning about construction, and how they pour concrete into form work, the best way of showing this is with a short animation.

    4. contain some text, and any visualization we create is imbued with the narrative and purpose we give it

      I think this is a great way to display data. Even if the text is short, it still gives a bit of context and would allow you more freedom of displaying the data and not having to be as strict with making the data clear as possible.

    1. The post doesn’t make an original argument and it doesn’t further our understanding of women’s history, colonial New England, or the history of medicine. It largely shows us things we already know about the past – like the fact that people in Maine didn’t plant beans in January.

      People don't plant beans in January? What?!? I find this is a great example of how people are fascinated by new methods of getting and showing the data we already know.

    2. To that end, I published an online component that charted the article’s digital approach and presented a series of interactive maps.

      There seems to be a line in the sand drawn here between traditional academic audiences and more digitally-savvy audiences. I wonder if the trouble with reconciling these two groups has anything to do with the challenges of introducing complex methodological techniques to scholars who might be intimidated or feel entrenched in their own methods

    3. It’s that there is a fundamental imbalance between the proliferation of digital history workshops, courses, grants, institutes, centers, and labs over the past decade, and the impact this has had in terms of generating scholarly claims and interpretations.

      I wonder if this bias has anything to do with the fact that digital labour often seems invisible from the outside. I imagine most historians can appreciate the work that goes into physical searches for data and transcribing sources, but in a culture that's conditioned to think of 'digital' as synonymous with 'instantaneous' (thinking of google search results for instance, or just search functions in general) maybe it's harder to recognize the amount of work that goes into developing and executing these methods, and therefore producing academic work that follows these techniques appears less prestigious

    4. The scholarship tent has gotten bigger, and that’s a good thing.

      This is one of the things I find most exciting about digital humanities work. The 'scholarship tent' is now starting to include many different disciplines, and thinking about how the tools that digital humanities scholars use could be applied to other fields is really exciting.

    5. Seven years later the digital turn has, in fact, revolutionized how we study history. Public history has unequivocally led the charge, using innovative approaches to archiving, exhibiting, and presenting the past in order to engage a wider public. Other historians have built powerful digital tools, explored alternative publication models, and generated online resources to use in the classroom.

      It's pretty neat to consider how quickly these new research methods have come into use

    6. Once you include the boring stuff, you get a much different view of the world from Houston in the 1890s. I ended up arguing that it was precisely this fragmentary, mundane, and overlooked content that explained the dominance of regional geography over national geography.

      I wonder how many things get over looked due to it being boring. This is pretty interesting what he was able to do utilizing the information that most overlook. Also I find its also important to be careful about large data sets like this as thing can be easily misinterpreted.

    1. A few minutes with this, and it becomes clear that this actually was a child-focused neighbourhood, that digital photography was at a minimum, and that white background cartoons and icons dominated to the detriment of more colourful images.

      I understand that he is showing that using different methods to visualize data can help us find pattern we might miss otherwise. What I don't understand is how this method is particularly useful based on the conclusion.

    2. takes a long time

      One thing I know I need to work on while working with data is patience, I find it easy to jump ahead of myself and confuse myself

    3. 243,520 images of all formats

      That's a lot of data to work with and display, it would be easy to become overwhelmed by it.

    4. Here’s a video of my findings:

      This is amazing! Absolutely blows my mind the things you can do using a couple simple lines of code.

    1. dirty OCR illuminates the priorities, infrastructure, and economics of the academy in the late 20th and early 21st centuries.

      This reminds me of the Coding History podcast from last week and Ian Milligan's remarks on Geocities. Today's google search result is tomorrow's history!

    2. The consequences of “errorful” OCR files, to borrow a term from computer science, influence our research in ways by now well expounded by humanities scholars, inhibiting, for instance, comprehensive search

      Thinking about the consequences of technology that produces "errorful" work is definitely interesting, but I think it's important to keep in mind that tech which produces seemingly errorless work should be scrutinized just as much. Maybe I'm being too cynical, but if I got a perfect piece of OCR I think I'd wonder what was left out in the process of making it so clear.

    3. Where did it come from, and how did it come to be?

      It's interesting to think of the digital object itself as an artifact, rather than just a copy

    4. treat the digitized object primarily as a surrogate for its analog original, we jettison the best features of both modes

      I feel this does not adequately acknowledge the difference between the two formats. Digital preservation is not the same as the origional no matter how perfect the copy is. As all the exercises this week prove, digital copies are able to show us more, more efficiently. The allows us to search with intent, perhaps missing things possibly gleaned through searching the paper copy.

    5. After the Star was digitized and made available, however, it became far more prominent” in dissertations

      This is another example of something previously discussed; the caution we must use in letting big data influence research. It must be understood that big digital data is not all-encompassing. What is not digitized influences the observations made on what is digitized.

    6. Printed books will never be the equivalent of handwritten codices, especially since printed books are often deficient in spelling and appearance.

      Funny as this seems to be the attitude toward any innovation in the preservation of information. Academics still prefer the printed word to the digital ones. History repeating itself.

    7. new edition—in the full bibliographic sense of the word—which, while it “departs more and more from the form impressed upon it by its original author,”

      I believe this to be a significant thing to remember. Every iteration of a text is different. Even if the words are exactly the same, it is different. Even the format of a digital edition of a text will influence the interpretations of scholars of the text. If the version is presented in plain-text, a word document, a blog post, these formats influence the way people think about what they are reading.

    8. criticism may just as rightly be applied to any other point in the transmission of the text

      This is an interesting notion. As a student in the humanities, I work with specific translations of works that are assigned to me, often these translations are not the most readily accessible version of the text. However, according to this argument, the text in all its forms is able to be criticised. It is not necessary to look at the original or the best translation for criticism to be considered valid. Something to bring up with professors as they are assigning booklists.

    9. Ryans-MacBook-Pro

      This is the transcriber's computer name? - Follow up: No, I think this is the article's author Ryan Cordell.

    10. and cite—digitized sources as transparent surrogates rather than new editions.

      Caution in how I will cite digitized sources is noted. As items are digitized, are corrections made? New notes added?

    1. python program

      I took a programming class last year and we learned Python. I wish the class had been taught differently. I'm not an engineering student and had no background in programming, which I thought was fine since it was an intro class. However, it seemed that the prof was not capable of teaching in "beginner mode". I don't feel like I learned anything at all.

    2. cleaning data is 80% of the work in digital history.

      If this is the case, then why don't historians keep a better record of the data cleaning process?

    1. In what ways would such a system change the nature of local knowledge, once that knowledge becomes available to the wider world on the web?

      A great question. I love the idea of EVERYONE being able to access their own cultural heritage, regardless of location within Canada.

    2. We wanted to bring the potential of digital technology to bear on a region with relatively low Internet access but also a relatively high interest in local history.

      I love this idea! As I read this article I am currently in a VERY small town called Temiscaming, Quebec (look it up and you'll see what I mean... they don't even have a Tim Horton's so there's a good comparison) sitting at a cafe, desperately trying to get internet connection from the town park. I've been coming here for 26 years and until recently I never really looked into the history of the town. This summer when I looked into it, I was very surprised at what I found. "Temiscaming, founded in 1917 by the Riordon Pulp and Paper Company, is a classic example of a “closed” company town built around a single industry." Some names that contributed to its founding include Thomas Adams, an eminent Scottish town planner, and Montréal architects Ross and Macdonald who built the mill, the commercial district, and all the houses in the lower town. My curiosity getting the best of me, I asked my waiter if she knew the history of her town and she couldn't recall. My point is that a small town with horrible internet can have an incredible local history, yet remain unknown to its own occupants. Many variables could play a role in this such as the very limited internet connection, the rural community, or the “lack of ability to manipulate the Internet for their own purposes”.

    3. Digital history has the potential to address these concerns by linking members of a community together to collaborate on historical projects.

      This is actually something I am currently tasked to do as part of my job. As a heritage research assistant for a local municipality, I am gathering information about local restaurants in the town that have contributed to the community as part of a display for a summer event. Not being from the town, I decided to create posts for the municipalities website and social media accounts that would engage the interest of the public. The posts include old photos of restaurants, combined with the address and a question that engages the attention of members of the community. This causes people to contribute to the post by commenting and thus sharing there knowledge. From there I will be able to navigate the comments and add the comments (shared knowledge) to my displays. This is something I am so proud of. It allows the members of the community to collaborate on the heritage restaurant displays while making them feel like they have history to contribute with everyone.

    4. ystems of rewards,

      The game for genealogists is to learn more about their family. I think the game for many local historians is to learn more about a place.

    5. Crowdsourcing should not be a first step

      What if bringing the crowd together is the first step? The historical resources of a locality that are available digitally are probably the tip of the iceberg compared to what exists on paper: photos, school records, deeds, wills, contracts, cemetery diagrams, diaries or family bibles to name some examples. Also oral history. Knowing where things happened. This is probably more true in an area that only recently received high speed internet. The crowd when brought together also brings their knowledge of paper or oral resources that likely few other people have and perhaps no one on the Internet yet.

    6. collecting them in an online database

      This could also serve to bring the creators of the content together. Crowd sourcing brings together a crowd. Just like most Wikipedia articles are improved when there is more than one contributor, crowdsourced local history could move ahead by bringing knowledgeable people together for synergy.

    7. We believe that this confusion was partly responsible for the evolution of the project from a tool where collaboration and community support was envisioned, a process of sharing authority, to one where we the historians seem to be using the crowd more as a reservoir, contrary to our intentions.

      Interesting point.

    8. Crowdsourcing should not be a first step. The resources are already out there;

      May as well attempt to build and starting point. Though by doing this i think the type of materials would be very different from the ones crowdsourced.

    9. Finally, we had a number of potential contributors who were worried that what they had to contribute was not “professional” enough

      Fair enough. It would be intimidating to submit something if one has never done it before, or felt they were smart/knowledgable enough too do so. I think crowdsourcing projects like this can really show populations they have something to contribute no matter the content.

    1. Place is constructed through multiple channels, from lived experiences to emotional attachments to acts of naming

      Place is a collection of all things personally experienced and known about through external sources. This is an interesting topic because surely there are different definitions and ideas and no proof one way or the other.

    2. brief coda offers a glimpse into the potential for digital analysis to answer this question through a comparison of the Houston Daily Post and an earlier Houston newspaper, the Telegraph and Texas Register.

      Neat to see data pulled from something as simple as the newspaper then digitized to show society/space being changed.

    3. At the heart of this orientation stood a commercial railroad network that had been expanding for the past half century

      This make complete sense to me, as the world expanded with the railroad so did space, places, time. As places became relevant, the news in these places would too!!

    4. Newspapers print, and thereby privilege, certain places over others

      Even as new medias arise, this issues is still strongly noticeable. The digital divide between urban/rural!!

    5. By printing some places more than others, papers such as the Houston Daily Post continually reshaped space for nineteenth-century Americans.

      Ah yes, the power of media. I love this topic. Similarly In another class, we talked about how the invention of the train reshaped peoples perception of space and time. Spaces became shorter, yet larger, and time itself grew.

    1. This is the most embarrasing to admit. I did not back things up regularly. I am not ever making that mistake again.

      Maybe the use technology for historians is relatively new enough that backing up isnt engrained yet into the minds of historians and the "ease" of using technology is still new and outweighing the risks?

    2. gone. Destroyed. Erased. No longer present.

      this is a scary thing i'm sure for digital historians or for anyone who's career relies greatly on the internet. lot's of money and time is probably spend on external hard drives and backing up all your files externally.

    3. artificial intelligence

      this is what I mean about technology always evolving and how historians will have to evolve with it and there's no doubt that even 10-15 years from now historians will be doing things very differently.

    4. I’ve only been at the ‘digital’ part since about 2005… and my experience of ‘digital’, at least initially, is in social networks and simulation

      I wonder what the most challenging part about being a digital historian is. I would think that it's constantly having to transfer files/data to new platforms because of updating technologies? Maybe there's also the aspect of having to weed out data/notes that are no longer relevant to what you're currently studying? If a better platform is created that meets your needs it would make sense to transfer your data there even though it may be a pain at first - it's just park of the job!

    5. It takes a lot of trial and error, and sometimes, just dumb luck. I kept poor records of this period

      I think that it is important to keep a log of everything you do when doing a big research project, so that others can get a clear idea of your thought path and why certain choices were made. Your trials and errors could reduce other's errors!

    6. Schedule time into every week to keep on top of security.

      Security, and as we learned in XLMing My Way to Data Management, keeping clean and organised filing systems.

    7. It takes a lot of trial and error

      Hist3814 101. Keep calm and study on folks!

    1. At this point in the eighteenth century, a 254x254 matrix is what we call Bigge Data. I have an upcoming EDWARDx talk about it. You should come.

      This is my kind of historical argumentation. Can I write my final paper from the perspective of a disgruntled turn of the century Anglophone Quebecker?

    1. I’m suggesting that some texts use irony and dark humor for more extended periods than you suggest in that footnote—an assumption that can be tested by comparing human-annotated texts with the Syuzhet package.

      I definitely agree with Swafford here, but I like Jockers' determination to try something with sentiment analysis, even if the technology isn't perfect. Swafford has a point, though: Jockers is reaching pretty far with his something when he claims to be able to read literature for emotion out of algorithms designed to detect positive and negative wording.

    2. they were developed for analyzing modern documents like product reviews and tweets.

      !!!!!! very important. We historians are so used to analyzing documents for intended audience - we need to analyze our tools for it as well! If our tools are an extension of scholarly work, then tools can be just as biased as any scholar or group of scholars, and only by getting more opinions in the mix can improve the results.

    3. Since each word is scored in isolation, it can’t process modifiers. This means firstly that intensifiers have no effect, so that adding “very” or “extremely” won’t change the valence, and secondly (and more worryingly) that negations have no effect.

      Yeah... this is a problem. I'm reminded of Milligan's discussion of the magic black box. We trust algorithms when they are presented to us through a pleasing interface. Swafford has done well to go "under the hood," as it were; but we can't all rely on some other dedicated scholar to go under the hood of every algorithm we apply to our own work. Digital historians have a responsibility to be aware of the limitations of their own tools.

    4. I communicated privately with him about some of these issues last month, and I hope these problems will be addressed in the next version of the package.

      Neat to know how the whole DH community is connected both privately (through e-mail - okay, semi-privately, big brother is watching, etc etc) and publicly. Many forms of dialogue are available to digital scholars, and more accessible than ever before, as we've discussed in previous weeks. Thus, Swafford was able to engage with Jockers' project on multiple levels - as observer, as user, as private adviser, and as public critic. More roles means more perspectives means more fruitful dialogue about these tools and methods.

    1. the mean emotional valence

      What does "emotional valence" mean? He probably explains it in his earlier post, but a quick rundown of his methods would have made this blog post stronger & more accessible.

    2. Vonnegut draws the plot of Cinderella for us on his chalk board, and we can imagine a handful of similar plot shapes. He describes another plot and names it “man in hole,” and we can imagine a few similar stories. But our imaginations are limited.

      Fascinating - he seems to be using digital methods to test and build off of anecdotal scholarship, eg unreliable scholarship. It must be noted that close-reading (like what Vonnegut does to Cinderella) was the way to do literary studies when Vonnegut developed his theory. Vonnegut wasn't lacking data at the time; data on the scope of Jockers' corpus of digitized classic novels just did not exist. This suggests the great possibilities of digital methods. It also serves as a reminder not to dismiss past (or current) scholarship just because of its contemporary limitations. We must build on the past, not discard it.

    3. These core conflicts will be familiar to students of literature: such constructions were once taught to us under the headings of “man vs. man,” “man against nature,” “man vs. society,”

      Flashbacks to high school English class. These have always seemed reductive to me - and not just because they're so implicitly masculine! The fact is that most great plots contain elements of all sorts of conflicts. Even if there is an overarching conflict, calling it "man vs society" rarely conveys the essence of the conflict. It just gives it a convenient label.

    4. syuzhet (the organization of the narrative) as opposed to its fabula (raw elements of the story).

      I would be interested in reading Propp's work one day, because I fear this could create a false dichotomy between "details" and "big picture." One can't exist without the other, so historians and literary scholars shouldn't focus on only one.

    5. a video of Kurt Vonnegut describing plot in precisely these terms.

      As a writer and a huge Vonnegut fan, I'd heard of Vonnegut's plot structures many times before. They came to mind immediately when I read the intro to this post, actually! I had no idea there was a video of him explaining it, though, nor that Vonnegut imagined computers could analyze plot. Neat.

    6. the relationship between sentiment and plot shape in fiction

      my inner literature nerd is swooning. This has so much potential, and it's the kind of study that would be very difficult without some digital tools (to extract the raw data from entire novels and to visualize the resulting plot structures in a way that allows for easy comparison).

    1. -E

      On mac terminal that is; on dhbox it'd be -r

    2. \n[^~].+

      \n[^~].+ says command not found ^[^~]*$ says that substitution failed?

      I just tried this out and neither commands are working.

    3. \r\n[^~].+

      I can't really figure out this part. I typed \r\n[^~].+ into Refine and nothing comes up. I've tried the other systems and this doesn't seem to work, even when I plug this into the dhbox, it doesn't seem to get rid of them.

    4. delete everything for the index of the list of letters.

      This isn't clear to me, and I think I may have interpreted it wrong. Say we have "Sam Houston to J. Pinckney Henderson, December 31, 1836 51" -- are we supposed to just remove 51, and do the same for each subsequent entry?

    5. If you were using a text editor on your own computer

      i'm having trouble finding the replace tab on terminal, does anyone know the key(s) to get it?

    6. ike line 178,

      does anyone know the command in nano text editor that allows us to see the name of the lines? I've been counting in the meantime, but if anyone knows at the top of their head, i'll copy and put that in my fail log for later to look back on.

    7. .+,.+,.+,

      This doesn't return anything for me when I run it in grep on my index.txt file, but other searches of the file do return things, and I have lines with more than 3 commas. What's going on?

    1. 'illusionary order'

      This link no longer seems to work.

    2. reflect

      and 'reflect' can also mean: tie what you're doing in the exercises to what you're thinking about here!

    1. I’ve been using FaceDetect to see what percentage of files have faces in them,

      That's such an interesting tool! Photos of faces would offer a more personal glimpse into the data I'm sure. Even just for pure interest sake that's a neat idea.

    2. We can find out percentages of colour vs. grayscale, resolutions, file format type, pixels, colorspaces, hues, RGB means, etc

      This is interesting, I imagine finding out the resolutions would give some insight into the quality of technology most people were using at the time.

    3. ImageMagick


    1. It is thus with the aim of improving close reading, rather than merely facilitating distant reading, that I have instituted TEI-lite, or basic XML encoding,

      It seems that a lot of work with texts within digital humanities focuses more on utilizing technology for distant readings. I think the way in which close readings can be improved through the use of different tools like Beals is doing can help to expand the ways in which digital historians interact with text and data.

    2. By using a standardised (if imperfectly aligned) set of encoding rules (TEI), my databases are instantly mine-able and manipulatable by any other academic, even I become unavailable.

      I can see this being very useful in large workplaces where the changes and documents created by one employee become almost impossible to find or understand after they leave. I recently experienced this when trying to make simple modifications to some quantitative data analysis. The previous person had saved the end files, but not their methods and calculations, so I had to guess and invent my own methods.

    3. this extra effort should not really be considered extra to a scrupulous historian

      As you have read, there are clearly benefits that accompany using a technical plan. "Plan" being the key word, it should always be apart of your digital work.

    1. XML is a formal model that is based on an ordered hierarchy, or, in technical informatic terms, a tree. It consists of a root (which contains everything else), the components under the root, which contain their own subcomponents, etc. These components and subcomponents are called nodes. In the case of our book example, the root node is the book, it might contain a title-page node, followed by a table-of-contents node

      I love this extended metaphor for XML! As a complete novice to XML I can start to grasp what it is by thinking of it as an "ordered hierarchy" organized with comparable parts to a book.

    2. If a book were merely a stream of words, with none of the layout and formatting that lets us recognize its constituent parts, the reader would have much more difficulty determining where logical sequences of thoughts (instantiated as chapters or paragraphs, described by titles, etc.) begin and end

      This is very true, yet I also think that today's young readers are adapting to less-structured textual displays. Social media and the rise of "amateur" authors and online books have shown us that structured text (with chapters and clear arguments) aren't always necessary and a "stream" of text can be just as effective.

    3. agment, though, may be merely well balanced, and there are stages in a digital humanities project where you may need to create well-balanced XML fragments that you will insert into an XML document, producing a well-formed (and perhaps also valid) XML document at the end. We’ll discuss the use of well-balanced XML fragments later in the course. Entities and numerical character references An XML document uses angle brackets to delimit markup. For this reason, XML cannot contain an angle bracket that is meant to represent a textual character, since XML software would be unable to distinguish this textual data from markup. XML reserves two characters that cannot be represented directly in text: the left angle bracket or less-than sign (“<”) and the ampersand (“&”). When these characters occur in text that is to be represented in XML, they must be replaced in the underlying marked-up document by entities. Entities begin with an ampersand and end with a semicolon; the part in between identifies the meaning of the entity. If you need to include these

      These bad boys really show how particular and aware one has to be when doing this type of work! If one does not take the time to make sure everything is correct the initial end result would be soooo disappointing.

    4. ts are not nested because

      Would anyone want to give me a simpler definition of serialization?


    5. XML trees has made scholars reluctant to surrender those practical advantages

      I can understand why. It would be hard to give up something that makes work go better!

    6. Computers can operate quickly and efficiently on trees (ordered hierarchies), much more quickly and efficiently than they can on non-hierarchical text. This means that if we can model the documents we need to study as trees, we can manage and manipulate large amounts of data efficiently.

      Not only that, but there would be less formatting issues in general.

    7. xtual document as an ordered hierarchy, or tree, so that it can be explored with computational tools. Humanities scholars use XML to represent their documents because the tree model is convenient both as a logical

      I think this is an especially important point. As per my previous annotations, conformity = clarity in the discipline and between scholars!

    8. text inside second definition term (dfn) node end tag

      I like the way this is describe. Helps to understand, in terms of an actual image, what XML really is.

    9. be used without reference to the Web or the Internet. For example, one could write XSLT to generate a table of information from an XML document and copy and insert it into a Microsoft Word document for printing on paper—all without ever being connected to the Internet.

      Interesting! So HTML is almost a higher level version of XML?

    10. documents in other forms. (In that latter case, you’ll need to convert them to XML, which typically involves a mixture of auto-tagging, where you run some global search-and-replace operations to insert markup, and manual tagging.) Or you may create new documents entirely from scratch, where you are creating not just the markup, but also the data content. Whatever the source of your data: The first stage of a digital humanities project is document analysis, where you determine the hierarchical structure of the documents that you wi

      Will have to come back to these points later on in the course, maybe discuss them at length with the Prof.

    11. The preceding is not well formed because it doesn’t have a single root element that contains everything else. To change it into well-formed XML, wrap the dairy and snack elements in a root element, such as shopping_list. The following also is not well formed: <paragraph>He responded emphatically in French: <emph><foreign language="french">oui</emph></foreign>!</paragraph> This example has a single root element (paragraph), but the emph and foreign elements inside the

      This is tricky. Until XML becomes habit, it will be important to be very careful in how you write anything.

    12. y be tagged as a place element, along the lines of: <paragraph>The American writer Jack London never lived in <place>London</place>.</paragraph> An author might wish to create an index of place names for the document, or cause a map to appear when the reader mouses over a place name while reading, or make it possible to search for the string “London” when it refers to the place, b

      THIS is why XML is so useful. Amazing. Non-digital historians are at a huge disadvantage if they are unable to manipulate text to this extent.

    13. XML does not cope as well with multiple simultaneous hierarchies as it does with single hierarchies.

      Even though this might at first seem like a negative aspect of XML, couldn't another argument be that this might force conformity among XML users? As such, it might lead to greater understanding across the digital humanities discipline.

    14. If we simply pass the computer an indifferentiated stream of text, it will not be able to identify the beginning and end of the various structural subcomponents. Markup is the process of inserting information into our document (this is called tagging in the XML context) that will take the structure that humans recognize easily and make it accessible to a computer.

      This makes an interesting connecting point to consider with an earlier point the author made. The current structure of words (spaces, paragraphs, e.t.c.) are fairly recent. Likewise, the digital version of this evolution can be seen in methods like XML.

    15. The beginning of every digital humanities project: Document analysis

      I'll need to come back here for my final project - definitely some interesting points that will certainly be relevant.

    16. The creation of electronic texts that can be used in primary humanities research that would be impossible (or so impractical as to be essentially impossible) to conduct without computational assistance. The use of existing computational tools to interrogate those texts and obtain original research results that would not be attainable without the tools. Where existing computational tools are not able to meet the research goals, the development of original computer systems and programs to meet the researcher’s needs.

      These three points are important in outlining some of the advantages this class will give us in helping improve our abilities as digital historians! Additionally, it helps to reiterate that being a digital historian is not fundamentally different from a non-digital historian, but that instead it offers a new set of tools we can work with.

    1. 1. Explore Data

      Will use this in the future and just go back and change short acronyms into their appropriate terms.

    1. As Director 0f University of Michigan Press I’m afraid to say that everything you say in this post, Sheila, is true. We’ve struggled over the last few years to bring innovative digital projects into the mainstream press workflow, and you’ve been caught in the middle.

      This was probably one of the more interesting parts of this article. This was a total admission of negligence on behalf of the university. It's great to see a faculty member step up and shoulder the load when someone voices their displeasure of being stuck 'between the cracks' of digital projects into mainstream press workflow.

    1. In the long term, it may mean allying with like-minded historians, social scientists, statistical physicists, and complexity scientists to build a new framework of legitimacy that recognizes the forms of knowledge we produce which don’t always align with historiographic standards.

      This is fascinating! The obsession with "interdisciplinary approaches" to history may lead to a complete redefinition of historical legitimacy. I think that such expansion of the field could be good. After all, there was a time when "history" meant "stories written about great white men, usually at the head of governments". Social history expanded the scope of history to include stories of diverse peoples, cultures, and practices... could we expand the scope even further?

    1. the most legible result to historians will necessarily involve a narrative reconfiguration.

      I like that he acknowledges this. I'm most comfortable writing in a narrative form. If I had to write purely about methods to write "good" or "real" digital history, I wouldn't have an easy time. Though I want to challenge myself in this class to document my methods more thoroughly, I like that I can use these methods to inform and enhance my favourite methods of doing history.

    2. To convincingly make arguments from a historical data description, you must back it up using triangulation–approaching the problem from many angles. That triangulation may be computational, archival, archaeological,

      Love this! It really puts digital history into the context of the wider discipline. We're not just messing around with new toys. We're doing important work that builds on old work and can be built on in the future.

    3. The above boils down into two possible points of further research: deviations from expectation, or deviations from internal consistency.

      Aha! This is giving me an idea for my project. Funny how digital history is almost like relearning how to do history from the ground up. New methodologies make us question old methodologies, but sometimes we need to remember where we started.

    4. You have a big dataset and don’t know what to do with it


    5. Knowing that a community exists between history & philosophy of science is not particularly interesting; knowing why it exists, what it changes, or whether it is less tenuous than any other disciplinary borderland are more interesting and historiographically recognizable questions.

      This has definitely been a barrier to my interest in digital history: I prefer the "why" and "how" questions more than the raw facts. I like that this is being addressed (and that I could potentially add to the discussion).

    1. The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form

      This only works if they are universally recognised as having done this. Otherwise, they are just one more group putting out materials that are not well taken up.

    1. This incredible speed and the use of microfilm originals comes at a cost, however.

      Wow, now I'm imagining thousands of university students blindly typing in words they think will get them useful information and consistently missing or misusing sources.

    2. But this historical approach generally usually remains unspoken, without a critical methodological reflection.

      Hmm, I hadn't really given much thought to critically including within my essays a section on how I located and decided to use journal articles.

    3. generally the results are beautiful, render relatively well, and are – crucially – immediate.

      I wonder if this is a positive or negative? Working with large amounts of data, I can see where easy results would be useful. However, with our society's obsession on immediate gratification, perhaps it would be good to have to work a little harder for important research findings.

    1. Teaching people digital skills is undoubtedly hard enough, but asking them to try something that they will most likely struggle with, and potentially fail to master (at least initially), can only make introducing these techniques to mainstream digital history a more difficult task.


    1. imagemagick

      This is a really cool and powerful utility that I've have used before! Its fun seeing this show up and being used in Digital History

    1. deserve access.

      I think this is an important notion. The study of history benefits greater society by helping everyone understand our nature and our progress as a species. History should not be exclusive, it should be inclusive and access to data is a big aspect of accessibility.

    2. academic publisher should have such a significantly different economic picture from standard publishers

      Academic publishers pride their content as distinctly 'better' than non-academic publishers, even when those non-academic publishers publish academic works. Those academic works are considerably more affordable and can thus be more accessible yet still have the years of research put into them by a hardworking academic. Universities themselves even see these as 'lesser' sources and I have even been told not to read anything about history not published by an academic publisher by a university professor. Such publications by non-academic publishers are targeted toward a wider audience of non-academics meaning the emphasis is different and possibly the 'drama' is played up. However this does not mean they are utterly useless. They are valuable as a way to find other sources as well as forming a cohesive picture of events. I think the reason academic publishers consider their work a premium good is pure elitism.

    1. we included a discussion forum on the Desk’s main page where volunteers could swap ideas, ask questions, or make requests of the project editors.

      Reminds me of our Slack page!!

    2. This paper describes how the Transcribe Bentham team sought to attract volunteer transcribers and build an online community.

      This got me thinking about how search engines can save the search history of its users. Would this also be considered big data? Not so voluntary.

    3. A project like Galaxy Zoo, for example, has successfully built up a community of more than 200,000 users who have classified over 100 million galaxies, thus supporting a great deal of academic researc

      Incredible how so many small pieces of effort can accumulate into something amazing.

    4. These tasks, it has been argued, can be accomplished more quickly and more cheaply by outsourcing them to enthusiastic members of the public who volunteer their time and effort for free

      Enthusiasm is key here. The fact that the people doing the work WANT to do the work more so than they have to because it's their job is important.

    5. The team’s lack of experience in using Adwords may account for this failure

      I think this is a very important part to include in the article as user error in using a tool.

    1. So here's the CND.xml, transformed into a csv: http://shawngraham.github.io/exercise/cnd.xml . If you 'view page source', you'll see the original XML again! Save-as the page as whatever-you-want.csv and you can do some data mining on it.

      Ignore this folks; a leftover from the 2016 version of this course when we worked with the Colonial Newspaper Database. In my defence, this workbook is as long as a regular academic book and I sometimes miss stuff. I do appreciate your annotations though that alert me to my weirdness! So keep up the good work.