- Nov 2016
-
www.bitbybitbook.com www.bitbybitbook.com
-
In the main text, I discussed making causal claims from non-experimental data using natural experiments and matching. In this appendix, I will introduce the potential outcomes model, and define more precisely the conditions that are required for causal inference from observational data. This chapter will draw on Morgan and Winship (2014) and Imbens and Rubin (2015).
My preference would be for a discussion that includes Pearl's DAGs as well as Rubin's potential outcomes framework.
Edit: My take is that Rubin's framework is rooted in a 20th century Fisherian orientation (which is why it's especially popular among statisticians), while Pearl's framework in part reflects new insights on probabilistic graphical models (which is why it's popular among computer scientists). The future, I suspect, will entail both approaches.
-
- Sep 2016
-
www.bitbybitbook.com www.bitbybitbook.com
-
mass collaboration projects also have democratizing potential
This is a great point and I think one at least sociologists will be sympathetic towards. I'm thinking here of Howard Becker's "hierarchy of credibility" principle.
-
but I am optimistic
Perhaps include a few sentences on why you're optimistic? I'm optimistic but I (or other readers) could have different reasons.
-
enables mass collaboration
I initially expected a discussion of mass collaboration in terms of researchers with other researchers. The lone scholar working alone (e.g., Einstein) is replaced with a team of researchers across continents and disciplines (e.g., Large Hadron Collider). However, this kind of mass collaboration may be beyond the scope of your book.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
5
five
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
show
show,
-
respondent’s
respondents'?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
In open call projects, the researcher poses a problem, solicits solutions from other people, and then picks the best.
An analogue age version of this is the "Delphi Method" developed by RAND in the 1950s. It's has problems and it's about predicting the future, but I see some similarities since often open calls today entail predicting some outcomes. The problem with the "Delphi Method" is that it (a) relies on pre-selected experts, (b) has no clear criterion for what's "best", and (c) presumes consensus-building is truth.
-
open call project
Italicize?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
5 general principles
Depends on the style guide, but it seems most style guides suggest writing out numbers 1 to 9. E.g., "There are five general principles..."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
get an estimate of the causal effect
To nitpick: you can still get an estimate of a causal effect without randomization (or adjustment), but it's likely to be a lousy estimate (unless particular assumptions are met).
-
recruitment, randomization, intervention, and outcomes
How about controlling?
Tags
Annotators
URL
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
could not lead to interesting research, but that’s not the case
Consider rewording: "...could lead to uninteresting research, but that's not the case."
-
nowcasting
Italicize, perhaps?
-
data
Do you mean observational big data or observational data in general?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Which of these approaches would work better? We don’t know, and in the process of finding out we might learn something important about families, neighborhoods, education, and social inequality. Further, these predictions might be used to guide future data collection.
This is a really, really great idea!
-
analogues
Elsewhere you use the spelling "analogs."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
there are actually many situations where social researchers want to code, classify, or label images or texts.
E.g., using Google maps and humans to code for "broken windows" in various neighborhoods.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
comparison
comparisons?
-
Hawthorn
Hawthorne
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
you should try to design a series of experiments that reinforce each other.
It'd be incredibly helpful if you could briefly discuss an example of experiments reinforcing each other.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Figure 4.17:
This figure would be clearer to me if the pictures were below the text "Info" and "Info + social."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
I can’t find any other examples of success,
If you have examples of failure, that may be informative as well.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
cumulative advantage.
Or what Merton called the "Matthew effect."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
incredibly important.
You can specify here why mechanisms are incredibly important. My take is that often experiments have a "black box" approach and that we don't actually understand a causal effect until we understand the mechanisms.
Conversely, understanding the mechanisms helps strengthen the case for a causal effect. The findings in psychology on supposed "psi" effects are weakened because there are no plausible mechanisms. Likewise, we knew smoking caused cancer back in the 1950s because we had a pretty good idea of the mechanism (e.g., tar) from qualitative data and simple observational studies.
Edit: As well, mechanisms could be used to identify causal effects (e.g., Pearl's "front-door" criterion).
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
ensures
Perhaps too strong?
-
question
questions
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
experiments
experiments'
-
Figure 4.1
Should this figure have some data points or text in the middle of the plot?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
In many situations, you just cannot measure and adjust for all the possible confounders.
And you may condition on a pre-treatment collider variable that induces a back-door path (!).
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
There is deep skepticism of certain types of stated preferences data in economics (Hausman 2012).
Also among some social psychologists (e.g., Banaji's work), although still probably not as skeptical as economists.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Big data sources and surveys are complements not substitutes so as the amount of big data increases, I expect that the value of surveys will increases as well.
I think you need to spell this out more clearly here. Will the value of surveys increase because of the decline of traditional landline surveys, so any plausibly reliable survey data will be more valuable? Or will the value increase because of the growth of big data, which can be combined with survey techniques?
-
increases
increase
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
persons
person
-
,
Remove comma
-
with 10-fold cross-validation
Consider including a sentence discussing cross-validation for social scientists. I suspect many are not familiar with cross-validation.
-
third-party
third party
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
There is just too much to be gained by linking survey data to other data sources, such as the digital trace data discussed in Chapter 2.
Perhaps mention "data fusion"?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
forecasting.
Perhaps explain how forecasting is different from prediction? (I view prediction as a more general category than forecasting.)
-
seem
seems
-
simple
simpler?
-
searchers
searches
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
More generally, with some creativity and design work, it is possible to improve the user experience for survey participants.
My informal experience is that people find open-ended survey questions (i.e., text responses) more enjoyable to answer than closed-ended survey questions. I have not seen any research on this, however.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
This domination is not because closed questions have been proven to provide better measurement, rather it is because they are much easier to use; the process of coding open-ended questions is complicated and expensive.
I agree, although there's some psychological work suggesting that closed-ended questions are more predictive of human behavior (i.e., quickly answering closed-ended questions is akin to a quasi-implicit bias that affects behavior).
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
asking
asking questions
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
how we ask
how we ask questions
-
analogue
Elsewhere you use the spelling "analog" rather than "analogue."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Of course, it would be better to do perfectly executed probability sampling, but that no longer appears to be a realistic option.
Was a "perfectly executed probability sampling" ever a realistic option?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Figure 3.4:
The y-axis needs a label.
-
,
Remove comma
-
Remove space
-
weighing
weighting?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
that you are less likely to learn about.
I find this phrasing somewhat confusing.
-
given
giving
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
the main text will be explained below
this chapter will be explained
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Remove space
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
,
Remove comma?
-
scienitsts
scientists
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
1.1 An ink blot
I like the Blumenstock et al. example, but I think the introduction would show the immense change going on with a parallel example from the analog age. E.g., compare Blau and Duncan's work on the American Occupational Structure, which required specifying hypotheses weeks in advance and entailed slow computation with punch cards.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Internal states exist only inside people’s heads, and sometimes the best way to learn about internal states is to ask.
Cf. Implicit Association Test
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
appears
appear
-
Many researchers
Who?
-
area probability sampling
Perhaps mention what prompted the widespread use of probability-based sampling (to parallel the next paragraph, which explains why RDD was used)?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Some people
Who?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
The growth of always-on, big data systems increases our ability to effectively use two existing methods: natural experiments and matching.
There's a third approach, too. Causal discovery algorithms (i.e., computational improvements) and large amounts of diverse observational data (i.e., always-on big data systems) are enabling researchers to create and evaluate complex DAGs from observational data.
Edit: There aren't many examples in the social sciences using these algorithms, but I think they have a lot of potential if used judiciously.
-
together(Einav et al. 2015, Table 11).
Add a space.
-
,
Remove this comma, perhaps?
-
within
from?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
estimating causal effects with natural experiments and matching.
See my previous point about causal discovery algorithms and large volumes of data.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
As Table 2.3 makes clear, natural experiments are everywhere if you just know how to look for them.
Or a critic might say "what people think are natural experiments are everywhere." Might be worthwhile to mention the criticisms of natural experiments (e.g., Rosenzweig and Wolpin 2000).
Also, I think you can be more forceful in what I think you're claiming -- that we have more opportunities to find plausibly natural experiments in the digital age.
-
mechanism
Replace with "the mechanism" or "mechanisms".
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
First, in a step typically called pre-processing, the researchers converted the social media posts into a document-term matrix (see Grimmer and Stewart (2013) for more information). Second, the researchers hand-coded the sentiment of a small sample of posts. Third, the researchers trained a supervised learning model to classify the sentiment of posts. Fourth, the researchers used the supervised learning model to estimate the sentiment of all the posts.
I think the figure would be clearer if you numbered the steps in the figure.
-
post
posts
-
in
is
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
not incomplete
complete
-
not non-representative
representative
-
100perday—andworkuntilthattargetismet,thendriverswouldendupworkingfewerhoursondaysthattheyareearningmore.Forexample,ifyouwereatargetearner,youmightendupworking4hoursonagoodday(
Check this text formatting. It's off on my computer.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
Generally, people have a pretty good sense of what is important.
This statement does not seem obvious to me. People's values (i.e., sense of what is important) can differ greatly.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
by Jon Kleinberg in a talk
Perhaps provide some context on Jon Kleinberg. E.g., "by the computer scientist Jon Kleinberg in a talk on X at Y."
-
run
running?
-
proceed
process
-
practical significance rather than statistical significance
Consider adding a sentence defining the difference between these two kinds of significance.
-
,
Remove this comma.
-
There is no single consensus definition of “big data”, but many definitions seem to focus on the 3 Vs: volume, variety, and velocity (e.g., Japec et al. (2015)). Rather than focusing on the characteristics of the data, my definition focuses more on why the data was created.
This definition is so common I'm thinking you should place this earlier when you discuss your definition of "big data."
-
difference
differences
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
abstract
abstruse?
(You mention abstractions in a positive light elsewhere in the book.)
-
but I will call them data scientists
Where would you place statisticians? Are they an audience for your book?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
either going to sacrifice quality by using ugly Readymades, or they are going to spend lots of time looking for the perfect urinal.
Consider rephrasing. I understand what you mean here, but a perfect urinal is indeed an ugly readymade.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
This study combines what we have done with in the past with what we can do in the present.
Consider including another example or two to further support your point. (Although it won't strictly be "back to the beginning" if you include other examples.)
-
The future of social research will be a combination of social science and data science.
I understand why it's a good idea to combine social with data science, but this statement makes it seem like it's a near-inevitability. I'm thinking you could add more material in this section on barriers to combining social with data science, and how we can overcome them.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
transition
Consider replacing with "one". E.g., "...in the process of making a transition like the one from photography to cinematography."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
also flip
Remove "also".
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
researchers
Consider adding: "as well as companies."
-
And, eBay was also.
Consider incorporating into the previous sentence.
-
Research
Researchers?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
the theoretical constructs in many existing theories.
Consider rephrasing.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
search engine queries
Search engine personalization presents some ambiguity, however, to the idea that these queries are non-reactive.
-
researcher
researchers
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
always-on data systems enable researchers to study unexpected events and provide real-time information to policy makers.
An admirable but flawed attempt at this was Argentina's Project Cybersyn in the early 1970s.
-
For example, social media data can be used to guide responses to natural disasters (Castillo 2016).
Perhaps you could be more specific here in the example? It will clarify your point more for readers, I think.
-
ex-post panel
Perhaps parenthetically define this phrase?
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
etc
The period is missing from "etc." In general, I tend to prefer "and so on" or "and so forth" instead of "etc."
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
often called digital traces
It seems that you've already defined digital traces several times earlier, so I would consider removing the phrase "are often called digital traces, and".
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
doing survey research
I think understand what you mean here, but I'm not sure all readers would understand how surveys entail an interaction with people.
-
-
www.bitbybitbook.com www.bitbybitbook.com
-
enough
Might be more specific here. E.g., "enough to fully map the wealth distribution in Rwanda."
-
transition
You use the word "transition" a lot in the first few sentences. I'd consider replacing this word with "change" or "switch" (or something similar).
-
the principles of social research in the past will inform the social research of the future.
Do you mean that the principles of analog age social research will inform those for digital age social research?
-
to run innovative surveys and to create mass collaboration
Another innovation is that physical distance becomes less important. Arthur C. Clarke predicted back in the 1970s that these new forms of communication would render physical travel obsolete. He said that people in the future (i.e., today) would "communicate, not commute."
-
These trends—increasing digital information and increasing computing—show no sign of slowing down.
I generally agree with this view, although there has been some discussion regarding a slowdown in Moore's law. E.g., https://www.technologyreview.com/s/601102/intel-puts-the-brakes-on-moores-law/
Tags
Annotators
URL
-