109 Matching Annotations
  1. Nov 2016
    1. In the main text, I discussed making causal claims from non-experimental data using natural experiments and matching. In this appendix, I will introduce the potential outcomes model, and define more precisely the conditions that are required for causal inference from observational data. This chapter will draw on Morgan and Winship (2014) and Imbens and Rubin (2015).

      My preference would be for a discussion that includes Pearl's DAGs as well as Rubin's potential outcomes framework.

      Edit: My take is that Rubin's framework is rooted in a 20th century Fisherian orientation (which is why it's especially popular among statisticians), while Pearl's framework in part reflects new insights on probabilistic graphical models (which is why it's popular among computer scientists). The future, I suspect, will entail both approaches.

  2. Sep 2016
    1. mass collaboration projects also have democratizing potential

      This is a great point and I think one at least sociologists will be sympathetic towards. I'm thinking here of Howard Becker's "hierarchy of credibility" principle.

    2. but I am optimistic

      Perhaps include a few sentences on why you're optimistic? I'm optimistic but I (or other readers) could have different reasons.

    3. enables mass collaboration

      I initially expected a discussion of mass collaboration in terms of researchers with other researchers. The lone scholar working alone (e.g., Einstein) is replaced with a team of researchers across continents and disciplines (e.g., Large Hadron Collider). However, this kind of mass collaboration may be beyond the scope of your book.

    1. In open call projects, the researcher poses a problem, solicits solutions from other people, and then picks the best.

      An analogue age version of this is the "Delphi Method" developed by RAND in the 1950s. It's has problems and it's about predicting the future, but I see some similarities since often open calls today entail predicting some outcomes. The problem with the "Delphi Method" is that it (a) relies on pre-selected experts, (b) has no clear criterion for what's "best", and (c) presumes consensus-building is truth.

    2. open call project

      Italicize?

    1. 5 general principles

      Depends on the style guide, but it seems most style guides suggest writing out numbers 1 to 9. E.g., "There are five general principles..."

    1. get an estimate of the causal effect

      To nitpick: you can still get an estimate of a causal effect without randomization (or adjustment), but it's likely to be a lousy estimate (unless particular assumptions are met).

    2. recruitment, randomization, intervention, and outcomes

      How about controlling?

    1. could not lead to interesting research, but that’s not the case

      Consider rewording: "...could lead to uninteresting research, but that's not the case."

    2. nowcasting

      Italicize, perhaps?

    3. data

      Do you mean observational big data or observational data in general?

    1. Which of these approaches would work better? We don’t know, and in the process of finding out we might learn something important about families, neighborhoods, education, and social inequality. Further, these predictions might be used to guide future data collection.

      This is a really, really great idea!

    2. analogues

      Elsewhere you use the spelling "analogs."

    1. there are actually many situations where social researchers want to code, classify, or label images or texts.

      E.g., using Google maps and humans to code for "broken windows" in various neighborhoods.

    1. you should try to design a series of experiments that reinforce each other.

      It'd be incredibly helpful if you could briefly discuss an example of experiments reinforcing each other.

    1. Figure 4.17:

      This figure would be clearer to me if the pictures were below the text "Info" and "Info + social."

    1. incredibly important.

      You can specify here why mechanisms are incredibly important. My take is that often experiments have a "black box" approach and that we don't actually understand a causal effect until we understand the mechanisms.

      Conversely, understanding the mechanisms helps strengthen the case for a causal effect. The findings in psychology on supposed "psi" effects are weakened because there are no plausible mechanisms. Likewise, we knew smoking caused cancer back in the 1950s because we had a pretty good idea of the mechanism (e.g., tar) from qualitative data and simple observational studies.

      Edit: As well, mechanisms could be used to identify causal effects (e.g., Pearl's "front-door" criterion).

    1. In many situations, you just cannot measure and adjust for all the possible confounders.

      And you may condition on a pre-treatment collider variable that induces a back-door path (!).

    1. There is deep skepticism of certain types of stated preferences data in economics (Hausman 2012).

      Also among some social psychologists (e.g., Banaji's work), although still probably not as skeptical as economists.

    1. Big data sources and surveys are complements not substitutes so as the amount of big data increases, I expect that the value of surveys will increases as well.

      I think you need to spell this out more clearly here. Will the value of surveys increase because of the decline of traditional landline surveys, so any plausibly reliable survey data will be more valuable? Or will the value increase because of the growth of big data, which can be combined with survey techniques?

    2. increases

      increase

    1. persons

      person

    2. ,

      Remove comma

    3. with 10-fold cross-validation

      Consider including a sentence discussing cross-validation for social scientists. I suspect many are not familiar with cross-validation.

    4. third-party

      third party

    1. There is just too much to be gained by linking survey data to other data sources, such as the digital trace data discussed in Chapter 2.

      Perhaps mention "data fusion"?

    1. forecasting.

      Perhaps explain how forecasting is different from prediction? (I view prediction as a more general category than forecasting.)

    2. seem

      seems

    3. simple

      simpler?

    4. searchers

      searches

    1. More generally, with some creativity and design work, it is possible to improve the user experience for survey participants.

      My informal experience is that people find open-ended survey questions (i.e., text responses) more enjoyable to answer than closed-ended survey questions. I have not seen any research on this, however.

    1. This domination is not because closed questions have been proven to provide better measurement, rather it is because they are much easier to use; the process of coding open-ended questions is complicated and expensive.

      I agree, although there's some psychological work suggesting that closed-ended questions are more predictive of human behavior (i.e., quickly answering closed-ended questions is akin to a quasi-implicit bias that affects behavior).

    1. Of course, it would be better to do perfectly executed probability sampling, but that no longer appears to be a realistic option.

      Was a "perfectly executed probability sampling" ever a realistic option?

    1. 1.1 An ink blot

      I like the Blumenstock et al. example, but I think the introduction would show the immense change going on with a parallel example from the analog age. E.g., compare Blau and Duncan's work on the American Occupational Structure, which required specifying hypotheses weeks in advance and entailed slow computation with punch cards.

    1. Internal states exist only inside people’s heads, and sometimes the best way to learn about internal states is to ask.

      Cf. Implicit Association Test

    1. appears

      appear

    2. Many researchers

      Who?

    3. area probability sampling

      Perhaps mention what prompted the widespread use of probability-based sampling (to parallel the next paragraph, which explains why RDD was used)?

    1. The growth of always-on, big data systems increases our ability to effectively use two existing methods: natural experiments and matching.

      There's a third approach, too. Causal discovery algorithms (i.e., computational improvements) and large amounts of diverse observational data (i.e., always-on big data systems) are enabling researchers to create and evaluate complex DAGs from observational data.

      Edit: There aren't many examples in the social sciences using these algorithms, but I think they have a lot of potential if used judiciously.

    2. together(Einav et al. 2015, Table 11).

      Add a space.

    3. ,

      Remove this comma, perhaps?

    4. within

      from?

    1. estimating causal effects with natural experiments and matching.

      See my previous point about causal discovery algorithms and large volumes of data.

    1. As Table 2.3 makes clear, natural experiments are everywhere if you just know how to look for them.

      Or a critic might say "what people think are natural experiments are everywhere." Might be worthwhile to mention the criticisms of natural experiments (e.g., Rosenzweig and Wolpin 2000).

      Also, I think you can be more forceful in what I think you're claiming -- that we have more opportunities to find plausibly natural experiments in the digital age.

    2. mechanism

      Replace with "the mechanism" or "mechanisms".

    1. First, in a step typically called pre-processing, the researchers converted the social media posts into a document-term matrix (see Grimmer and Stewart (2013) for more information). Second, the researchers hand-coded the sentiment of a small sample of posts. Third, the researchers trained a supervised learning model to classify the sentiment of posts. Fourth, the researchers used the supervised learning model to estimate the sentiment of all the posts.

      I think the figure would be clearer if you numbered the steps in the figure.

    2. post

      posts

    3. in

      is

    1. not incomplete

      complete

    2. not non-representative

      representative

    3. 100perday—andworkuntilthattargetismet,thendriverswouldendupworkingfewerhoursondaysthattheyareearningmore.Forexample,ifyouwereatargetearner,youmightendupworking4hoursonagoodday(

      Check this text formatting. It's off on my computer.

    1. Generally, people have a pretty good sense of what is important.

      This statement does not seem obvious to me. People's values (i.e., sense of what is important) can differ greatly.

    1. by Jon Kleinberg in a talk

      Perhaps provide some context on Jon Kleinberg. E.g., "by the computer scientist Jon Kleinberg in a talk on X at Y."

    2. run

      running?

    3. proceed

      process

    4. practical significance rather than statistical significance

      Consider adding a sentence defining the difference between these two kinds of significance.

    5. ,

      Remove this comma.

    6. There is no single consensus definition of “big data”, but many definitions seem to focus on the 3 Vs: volume, variety, and velocity (e.g., Japec et al. (2015)). Rather than focusing on the characteristics of the data, my definition focuses more on why the data was created.

      This definition is so common I'm thinking you should place this earlier when you discuss your definition of "big data."

    7. difference

      differences

    1. abstract

      abstruse?

      (You mention abstractions in a positive light elsewhere in the book.)

    2. but I will call them data scientists

      Where would you place statisticians? Are they an audience for your book?

    1. either going to sacrifice quality by using ugly Readymades, or they are going to spend lots of time looking for the perfect urinal.

      Consider rephrasing. I understand what you mean here, but a perfect urinal is indeed an ugly readymade.

    1. This study combines what we have done with in the past with what we can do in the present.

      Consider including another example or two to further support your point. (Although it won't strictly be "back to the beginning" if you include other examples.)

    2. The future of social research will be a combination of social science and data science.

      I understand why it's a good idea to combine social with data science, but this statement makes it seem like it's a near-inevitability. I'm thinking you could add more material in this section on barriers to combining social with data science, and how we can overcome them.

    1. transition

      Consider replacing with "one". E.g., "...in the process of making a transition like the one from photography to cinematography."

    1. search engine queries

      Search engine personalization presents some ambiguity, however, to the idea that these queries are non-reactive.

    2. researcher

      researchers

    1. always-on data systems enable researchers to study unexpected events and provide real-time information to policy makers.

      An admirable but flawed attempt at this was Argentina's Project Cybersyn in the early 1970s.

    2. For example, social media data can be used to guide responses to natural disasters (Castillo 2016).

      Perhaps you could be more specific here in the example? It will clarify your point more for readers, I think.

    3. ex-post panel

      Perhaps parenthetically define this phrase?

    1. etc

      The period is missing from "etc." In general, I tend to prefer "and so on" or "and so forth" instead of "etc."

    1. often called digital traces

      It seems that you've already defined digital traces several times earlier, so I would consider removing the phrase "are often called digital traces, and".

    1. doing survey research

      I think understand what you mean here, but I'm not sure all readers would understand how surveys entail an interaction with people.

    1. enough

      Might be more specific here. E.g., "enough to fully map the wealth distribution in Rwanda."

    2. transition

      You use the word "transition" a lot in the first few sentences. I'd consider replacing this word with "change" or "switch" (or something similar).

    3. the principles of social research in the past will inform the social research of the future.

      Do you mean that the principles of analog age social research will inform those for digital age social research?

    4. to run innovative surveys and to create mass collaboration

      Another innovation is that physical distance becomes less important. Arthur C. Clarke predicted back in the 1970s that these new forms of communication would render physical travel obsolete. He said that people in the future (i.e., today) would "communicate, not commute."

    5. These trends—increasing digital information and increasing computing—show no sign of slowing down.

      I generally agree with this view, although there has been some discussion regarding a slowdown in Moore's law. E.g., https://www.technologyreview.com/s/601102/intel-puts-the-brakes-on-moores-law/