an
a
an
a
adopt
to adopt
want
wants
necessarily
?
The ball is now in your court: think how to use skills developed through the book to formulate good research questions, form collaborative teams, and shape the future of scholarship.
nice
several concern has
awkward
whereas the latter may be even hampered by such approaches
yes
knowledge gaps and interest gaps
interesting
.
The challenges you describe here are real and important, and I think the story is helpful. However, I think these problems could occur in one paper not just a series of papers
and articulations against established scholarship and ability to transform those ideas into computational code
true, but do you need it
both sociological imagination and technology imaginations are required to identify interesting research questions
yes
in
?
Therefore, we see hybrid approaches were humans and computational analysis are deeply engaged in the sense-making process as one emergent approach for doing computational social science work.
Agreed. But I think that some of the people doing "solo computational social science" probably think they are doing hybrid. I think the lines between these could be more clear.
There is a range of opportunities is
awkward
A valid question is that how to choose the concepts focused on.
unclear
those
unclear
'
?
studies analysis
missing a comma?
it often academics build on previous traditions and scholarship.
unclear
These highlights invite us to consider new ways of doing research, relaxing the traditional strong division between theory-first (deductive) and theory-last (inductive) thinking.
agreed. I like the contrast between theory-first and theory-last
(With any and all meanings of theory - disciplines within social sciences and computational social science may have different ideas about what constitutes theory.)
this seems to important to put in parentheses
loan
borrow
.
could you list them here?
Underlying are deeper differences between what is the role of theory in research.
unclear
Computational methods are still seeking to understand how it should be done to ensure clarity of concepts in its work
unclear
discuss about the relevant conceptual thinking and therefore, lacks a theoretic contribution
could you make this phrase shorter
theoretic
is this needed? Think of the Passi and Barocas example.
solving
asking?
Put more bluntly: if someone plans to shoot oneself in the foot, computational methods allow using bazooka instead a smaller gun.
nice
add references
typo
This was a long introduction to say that this book tries not to advocate for a particular way of doing research, but embrace openess on these questions.
I would say you are advocating for an approach: an open approach.
[width=.7]figures/research_process
broken figure
In computational social sciences, the operationalisation process involves transforming a theorethical perspective into an algorithm.
I'm not sure this is always true
What are the three different aspects of how computational social science challenges existing social science research practices?
unclear
Technological imagination refers to the ability to use computational tools to capture the interesting questions.
This is important. I like the the way you make this parallel to sociological imagination.
.
Might be helpful for discussion about trading zone: https://link.springer.com/article/10.1007/s12108-015-9291-8
Based on my own experiences, these challenges can even become emotionally charged. Differences in core assumptions may be difficult to discuss even among academics. However, for computational social science to succeed, everyone must be able to cooperate across these types of collaborations. We will return to the opportunities and problems of multidisciplinary collaboration in Chapter 11.
I'm glad that you address this explicitly
we
I
(
remove
Table 1.1: Different approaches of computational social science
I like this table
However, the gist of the work is to ensure that the research and findings speak with existing social science literature.
the theory-driven approach feels a bit different than the others to me because it includes work some the other categories, whereas some of the other categories are mutually exclusive.
rigorous
flexible?
component
do you mean input or output?
Within this book, social science theory refers to the collective knowledge developed and conceptualised in social scientists over decades.
does "theory" also then include empirical results?
None of them is wrong. Instead, they highlight different pieces of the puzzle.
nice
decreases a negative social phenomenon or increases a positive social phenomenon.
I often hear this community described as data science for social good.
Based on the symposium series, most people have their main discipline related to computers of information technology.
unclear
Rather, it seems to adapt methods familiar to physicists and complexity scientists to social science research problems.
Based on this I think you are talking about more than just agent-based modeling.
paradigm of conducting social science research.
is this paradigm really agent-based models or do you mean to include more?
check for a good quote?
?
After the tournament, the winning strategy was tit-for-tat, a simple approach where one defected only in cases where the opposite side had already defected in the previous turn.
To me this is a really surprising result, but you don't really tell the story and emphasize the cool ending.
tool
model?
complex
simple or complex?
What — if any — differences are there between advanced quantitative research methods and computational research methods?
Agreed Efron might address that point some in this book: https://web.stanford.edu/~hastie/CASI/
the core idea is not the novel data but advanced and more rigorous methods that allow closer investigation than before.
interesting distinction
In this book, I will advocate that this is not the case.
nice
predicting variables from the data.
unclear
Thus, their work shows that political polarisation occurs in political blogs. There were clear divisions based on parties.
could these sentences be combined?
problems
This paper might define and clarify solution-oriented social science, which I think is related to what you are describing: https://www.nature.com/articles/s41562-016-0015
:
I think this is a good list
refs
typo
Therefore, it is hard to justify that computational social science would be a holistic discipline but rather a multidisciplinary mesh of scholars doing research using computation with social science questions in mind.
what do you think?
including me
I like that you explicitly include yourself here. In the previous paragraph I was wondering what you think about the "end of theory" debate.
makes all social scientists
unclear to me
Most intriguing but also most controversial thoughts have asked
I found this confusing
or
should this be "and"?
.
There are lots of examples in this paragraph. It may be hard for some readers to keep track of each one and how they are inter-related.
a larger number than in the original hallmark study.
how did the larger number of stories help them learn something new?
digital society
This is not parallel with the others. Do you mean problem-driven?
drive
driven?
.
Further, I am grateful to the following people for telling me about errors and typos in the hardback edition: Nimrod Priell, David Marker, Giannis Kanellopoulos, Hiroki Takikawa, Jun Tsunematsu,Takuto Sakamoto, Shinya Obayashiand, Anna Ballarino, Arthur Spiriling, and the hypothesis user named arnaud.
contacted to the researchers
contacted the researchers
approached
approaches
At
As
Tastes, Ties, or Time
Tastes, Ties, and Time
Figure 4.3: Schematic of the experimental design from Schultz et al. (2007). The field experiment involved visiting about 300 households in San Marcos, California five times over an eight-week period. On each visit, the researchers manually took a reading from the house’s power meter. On two of the visits, they placed doorhangers on each house providing some information about the household’s energy usage. The research question was how the content of these messages would impact energy use.
In the figure, "3 week" should be "3 weeks"
in order do
in order to do
Figure 3.7: Demographics of respondents in W. Wang et al. (2015). Because respondents were recruited from XBox, they were more likely to be young and more likely to be male, relative to voters in the 2012 election. Adapted from W. Wang et al. (2015), figure 1.
In the x-tick marks in the panel titled "State", Obama should be capitalized and Romney should be capitalized.
statistical statistical
statistical
Wikipedia.
If you are having trouble finding the data, here's a place to look: https://wikipediaviews.org/multiplemonths.php
???
Broken citation. Correct citiation is https://doi.org/10.1177/0894439315573926
¿sec:algorthmically-confounded?
broken link
[@king_how_2016
Broken citation. The correct citation is https://doi.org/10.1017/S0003055417000144
Figure 2.2:
There is a typo in the figure. The dates for "During Gezi" should be May 28, 2013 - August 1, 2013.
custom-made
custommade
American Association of Cancer Researchers
American Association for Cancer Research
impartial
partial
dominate
dominant
Academics of Science
Academies of Sciences
recoding
recording
Athey (2017), Cederman and Weidmann (2017), Hofman, Sharma, and Watts (2017), (???), and Yarkoni and Westfall (2017)
There references to these articles did not appear in the hardback print edition. They are all from the same issue of Science: http://science.sciencemag.org/content/355/6324
contests
calls
effort building
effort in building
about about
about
a project to classify a million galaxy classification
a project to classify a million galaxies
Write an email summarizing what you think is happening and recommend a course of action.
Move this sentence into the paragraph
it is
it
Design advice
Advice
difference-of-differences
difference-in-differences
difference-of-means
difference-in-means
difference-of-means
difference-in-means
difference-of-differences
difference-in-differences
research
researcher
excludibility
excludability
these these
these
have have
have
pre-treatment
pre-treatment information
Olmstead
Olmsted
other other
other
are who
who are
use
used
the response propensity
the same response propensity
XBox
Xbox
XBox
Xbox
XBox
Xbox
XBox
Xbox
XBox
Xbox
particular
particularly
treatment assignment
encouragement
to treatment
of the encouragement
donates
denotes
inaccessible,
This "inaccessible" should be removed. It is mentioned earlier in the sentence.
Yi(1)
Y_i(1) -> Y_i(0)
1
1 -> 0
have have
have have -> have
need
need -> needed
lab
lab -> field
employment
employment -> unemployment
are who
are who -> who are
embedded
embedded -> enriched
[Y(1,Wi(1))−Y(0,Wi(0))]
Y -> Y_i
think
think -> thinking
[, ] In a lovely paper, Lewis and Rao (2015) vividly illustrate a fundamental statistical limitation of even massive experiments. The paper—which originally had the provocative title “On the Near-impossibility of Measuring the Returns to Advertising”—shows how difficult it is to measure the return on investment of online ads, even with digital experiments involving millions of customers. More generally, the paper clearly shows that it is hard to estimate small treatment effect amidst noisy outcome data. Or stated diffently, the paper shows that estimated treatment effects will have large confidence intervals when the impact-to-standard-deviation (δ¯yσ
Here's an improved version of this activity:
https://gist.github.com/msalganik/064678b4eb7625e3ecb25e8a65eff38b
[, , ] Michel et al. (2011) constructed a corpus emerging from Google’s effort to digitize books. Using the first version of the corpus, which was published in 2009 and contained over 5 million digitized books, the authors analyzed word usage frequency to investigate linguistic changes and cultural trends. Soon the Google Books Corpus became a popular data source for researchers, and a 2nd version of the database was released in 2012. However, Pechenick, Danforth, and Dodds (2015) warned that researchers need to fully characterize the sampling process of the corpus before using it for drawing broad conclusions. The main issue is that the corpus is library-like, containing one of each book. As a result, an individual, prolific author is able to noticeably insert new phrases into the Google Books lexicon. Moreover, scientific texts constitute an increasingly substantive portion of the corpus throughout the 1900s. In addition, by comparing two versions of the English Fiction datasets, Pechenick et al. found evidence that insufficient filtering was used in producing the first version. All of the data needed for activity is available here: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html In Michel et al.’s original paper (2011), they used the 1st version of the English data set, plotted the frequency of usage of the years “1880”, “1912” and “1973”, and concluded that “we are forgetting our past faster with each passing year” (Fig. 3A, Michel et al.). Replicate the same plot using 1) 1st version of the corpus, English dataset (same as Fig. 3A, Michel et al.) Now replicate the same plot with the 1st version, English fiction dataset. Now replicate the same plot with the 2nd version of the corpus, English dataset. Finally, replicate the same plot with the 2nd version, English fiction dataset. Describe the differences and similarities between these four plots. Do you agree with Michel et al.’s original interpretation of the observed trend? (Hint: c) and d) should be the same as Figure 16 in Pechenick et al.) Now that you have replicated this one finding using different Google Books corpora, choose another linguistic change or cultural phenomena presented in Michel et al.’s original paper. Do you agree with their interpretation in light of the limitations presented in Pechenick et al.? To make your argument stronger, try replicate the same graph using different versions of data set as above.
Here's an improved version of this activity: https://gist.github.com/msalganik/21a585ff38bee58db320ed3329d801b1
see Appendix Table 10
see appendix table 10 and footnote 139.
evaluting
evaluating
evaluting
evaluating
δ¯yσ
this notation does not match Eq 7 in Lewis and Rao. I think there is a typo.
impact-to-standard-deviation (δ¯yσ
the impact-to-standard-deviation ratio is also called Cohen's D: https://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d
diffently
differently
2nd version of the corpus, English dataset
On the google books website this is labeled: "Version Version 20120701"
1st version of the corpus, English dataset
On the google books website this is labeled: "Version 20090715"
“1880”, “1912” and “1973”
to match paper, this should be 1883, 1910, 1950
Social research is a process of asking and answering questions about human behavior.
this opening sentence in pretty boring
web-based experiment
testing
DRAFT Please do not distribute
This should be removed before the book is published.