56 Matching Annotations
  1. Nov 2018
    1. o date, humanists have taken one of two approaches in preparing and disseminating geo-historical information on the web. Large projects with significant funding have tended to follow the lead of the larger geo-science and commercial GIS communities in placing emphasis on the elaboration of extensive metadata describing their datasets, thereby creating a basis for their discovery and inclusion in digital repositories. In some countries (notably the U.K.) emphasis by national funding bodies has encouraged such behavior.

      Need this in the US! Thankfully the UK also speaks English and we can use these sources.

    2. Many of them are also breaking down traditional divisions between browse and search, thematic layers, web content, spatial processing and geographic datasets, not least through the mechanisms known generically as “mashups” (web applications that dynamically combine data and services from multiple, other web applications to provide a customized service or data product).

      Very interesting

    1. Figure 1. Triangles represent where the metadata identified the interview location in Virginia

      I am curious to see how the metadata led them to these locations

    2. Specifically, we start by finding references to specific political divisions (countries, states, and cities), georeferencing them through the Google API

      Interesting that Google seems to dominate natural language processing.

    1. The federal government gathers no official, reliable records of police use of force, and those records that do exist at the federal Justice Department are known to seriously undercount police use of violence.

      Why not?

    2. Data: Tolnay-Bailey-Beck Database of Southern Lynch Victims.

      Very cool interactive data visualization

    1. Vi ualizations are all based on this sequence: parameterization (ass1gnmg a metric), IS cl 1 . cificacion (counting or measuring what has been parameterized), an trans atmg •5 captured, constructed information into a graphic.

      How so?

    2. he columnar form of the spreadsheet, for instance, which, as O Ctftc above, goes back into Mesopotamian times,

      Wow. Crazy to see how much influence this still have in society

  2. Oct 2018
    1. usively about women’s history, this approximate three percent represents the content of all scholarship that focuses on women’s history to the exclusion of other topics. In other words, topic modeling can separate out the parts of women’s history abstracts that simultaneously address other scholarly subjects

      Great point about topic modeling

    2. ed for about fifty-seven percent of women’s history-focused articles, and about sixty-three percent of overall articles. Still, this analysis supports Judith M. Bennett’s conclusion that women’s history and overal


    3. 96Journal of Women’s HistorySpringIn this light, even though the twentieth century remains the largest focus of history as a whole, it does not appear that women’s history has any more disproportionately focused on that time period; the increases in twentieth-century women’s history have not outpaced the increases in other time periods. Indeed, considering the limitations that face historians of periods predating the twentieth century, women’s and gender historians have become

      This is very unexpected.. I would expect much higher amounts during the 20th c

    4. abstracts (Table 5), suggesting a transregionally shared lang


    5. 2011 Sharon Block and David Newman85word “women” grew by only thirty-three percent over this period. As most women’s historians know, the discussion of “women” quickly took a back seat

      Great plot

    1. Perhaps I should not say “terrorists” so rashly. But you can see how tempting it is. Anyway, look—there he is again, this Mr Revere!


    2. Rather than relying on tables, we can make a picture of the relationship between the groups, using the number of shared members as an index of the strength of the link between the seditious groups. Here’s what that looks like.

      Very informative pic

    1. Consider an example such as the one displayed in Figure 2. George Villiers, Duke of Buckingham (1592-1628), knew King Charles I (1600-1649), and Charles I knew Prince Rupert of the Rhine (1619-1682), but Buckingham and Prince Rupert – whose lives only barely overlapped – never met. Because Prince Rupert and Charles I are connected, they will tend to be mentioned together in source documents. How often Prince Rupert is mentioned can therefore be predicted in part from how often Charles I is mentioned. Likewise if Charles I and Buckingham are connected, mentions of Buckingham predict mentions of Charles I.

      This example helped me understand some of the complexity in the model

    2. We therefore decided to focus on the 58,625 biographical entries that make up the ODNB. Running to sixty volumes in its print format, the ODNB is the labor of 10,000 scholars who have collectively contributed its 62 million words

      Tons of data

    1. “modeling points the way to a computing that is of as well as in the humanities: a continual process of coming to know by manipulating representations.”

      I agree with this quote

    2. raditional humanities scholars often equate digital humanities with technological optimism. Rather the opposite is true: digital humanists offer the jaundiced realization that computational techniques like topic modeling — long held inaccessible and unapproachable and therefore unassailable — are not an upgrade from simplistic human-driven research, but merely more tools in the ever-growing shed.

      Interesting concept

    1. 200 billion emails are sent and some 5 billion Google search queries areperformed – and they are nearly all text-based.


    2. hite players were more likely to be called “intelligent” and blacks more likely to be called“natural.”

      This is shocking...

    1. As Web search has proven to be very important, it is not hard to imagine that opinion search will also be of great use. One can crawl the user-generated content on the Web and enable people to search for opinions on any subject matter. Two typical kinds of opinion search queries may be issued:

      Relating to an opinion search is another interesting concept

    2. Sentiment classification can obviously be formulated as a supervised learning problem with two class labels (positive and negative). Training and testing data used in existing research are mostly product reviews, which is not surprising due to the above assumption.

      Interesting link to Machine Learning

    1. Moreover, the impact of the words I and my seems to grow even further


    2. e possessive, in turn, contributed to the evolution of language roughly at the times of the Prohibition. (Again, this is not to say that any direct links between function words and actual events in history should be drawn).

      I agree the purpose of the study is not to try to link any events of history, but in continuation I think this could be an interesting project. Why during prohibition would there be a shift in language. (It is also possible this a spurious cause/effect, like cheese consumption in wisconsin to murder rates)

    1. In the present day, it is virtually impossible for scholars to avoid text-mining software altogether, even if many of us only encounter it indirectly through platforms like Google or JSTOR

      Very interesting concept, I am curious how Google relates the text-mining to relevant search results. This would be like autocorrect?

    2. But the “particular use” of stream to which Jockers is referring is neither a word type (the word stream considered in the abstract) nor a word token (a particular instance of the word in a text)—it is an entry in a probability table that was generated through an approximate optimization method. This unit does not correspond to a single “use” of a word in any usual sense, but rather derives from patterns among many different instances of the word in the corpus. Although some of these instances might refer to a body of flowing water, there is no guarantee that they all use the word in the same sense—there are, for instance, at least a few dozen references to a “stream of settlers” in nineteenth-century texts that discuss conflicts between Europeans and Native Americans, and if these are present in Jockers’s corpus they would likely be included in the topic he discusses

      Very interesting use case and one that shows the difficulty of word mining.

    1. The above findings imply that unemployment tends to have serious implications for the individual. In particular they show joblessness is associated with a marked rise in anxiety, depression and loss of confidence and of self-esteem.


    2. t appears that work has a different meaning for different people. It may be a source of prestige and social recognition, a basis for self-respect and sense of worth, an opportunity for social participation or merely a way of earning a living.

      This is an interesting point. Work is very different for different people and may cause fuzzy results

    1. ignificantly higher odds of experiencing a marked rise in anxiety, depression and loss of confidence and a reduction in self-esteem


    1. (The median spell from admission to death for those who diedin the workhouse was only 1.6 months. Twenty-five per cent died withina fortnight.)

      Interesting to see how this effects the data.

    2. The death rate in the workhouse seemed excessive, with the highnumber of infant deaths attracting particular notice.


    1. simi-lar finding was reported byWardrop (1995), who showed that the“hot hand” in basketbal

      This is another fascinating topic. As a basketball player, I 100% believe the "hot hand" exists, but after looking at many studies and statistical tests, it is shown to not exist at all. Still not sure which side I am on.

    2. but when individual graduate schools aretaken into account, there seems to be bias toward females.

      This actually makes sense. Masters programs present statistics in order to get students to apply and a very common one is %male/female. In the case of certain fields like business & STEM there are far more male applicants than female. Therefore in order to make the school have a better ratio they must be accepting more female students

    1. I use better data,robustness checks, and estimation strategies than in the existing literature to arriveat these results

      With adjusted R^2 of < ~.25 how much better can the data be?

    2. Further, Carpenter [2005] does not find a significant earningspenalty for same-sex behaving men using GSS data.

      Seems like a lot of mixed results. I am not sure exactly what to believe based on the sources.

    1. sen in recent years have not yielded easily to standard macro analysis. To understand the sources of the long-term decline in saving and investment rates, the factors influencing the rate of technological change, or the long-term shifts in the demographic structure of the population and the labor force, we need t

      very interesting

    2. records, manuscript sched- ules of the census, and medical records. The pilot studies have been aimed at determining whether the creation of the projected data sets is economically feasible and w

      Seems like a preamble to modern data analysis such like ML

    1. except in cases in which a prisoner is released in one state and readmitted to prison in another

      Does the current data account for prisoners who did this? Could this be inflating the results?

    2. lack of data has complicated efforts to understand the aggregate effects of myriad federal, state, and local efforts to reduce reoffending.

      I agree, a lack of data does show that the results may not be conclusive of anything.

    3. .

      This chart clearly shows the progress thus far.

    1. bis is Tr.uitor tlui;:e Ct:nrntit:s from now.

      how far can we go into the future before the validity of data analysis breaks down?

    2. Ve h:wc taken that into account :L-. well. \Xfdl, nc:vcr mind. You will mc-t:L me, I sup• pose.at tllc: Un.iversity co1norro",r?

      Thought provoking

    1. Nothing has been more vital to the computer’s success as an administrative tool than the development of software to hide the complexities of data manipulation from application programmers and end users


    2. : “Google’s gender is a gender of profitable convenience. It’s a category for marketing that cares little whether you really are a certain gender, so long as you surf/purchase/act like that gender”

      This shows Google only cares about what makes money and not necessarily what is right. The culture of Google is very tolerant generally speaking, but the process behind the profit is what ultimately drives decisions.

  3. Sep 2018
    1. Desmond’s ultimate policy solution is a universal housing voucher program, which would certainly alleviate the eviction crisis,

      Very interesting

    2. He regularly charges $20,000 for nonprofit sponsored events that include some combinations of a talk, book signing, and roundtable or classroom discussion.

      This seems very high and unnecessary given the cause he is promoting. Great evidence.

    1. Therefore,database and narrative are natural ’enemies’.

      This was an interesting comparison for two abstract ways to portray data

    2. The open nature of the web as medium (web pages are computer fileswhich can always be edited) means that web sites never have to becomplete; and they rarely are.

      Interesting comment, I think this is a major contrast and why web is so dynamic compared to traditional books.

    1. Now, if Amazon operated like the justice system, it would start by scoring shoppers as potential recidivists. Maybe more of them live in certain area codes or have college degrees.

      Interesting comparison

    2. Shocking 23% change

    3. Great example of the power of data

    1. I actually tried this and this is shocking...

    2. How does SOA bias span other topics? ex. Politics, Education, Careers

    1. the reality is that we will always have to operate in 3rd party domains

      This is shocking because data is never truly private.

    2. That they can download their data, access via APIs

      Interesting concept, should you have full ownership of your data? If yes, how can we enforce this?