311 Matching Annotations
1. Feb 2021
2. systematicreviewsjournal.biomedcentral.com systematicreviewsjournal.biomedcentral.com
1. To deal with this, we organised all of the factors into six overarching categories, comprising three barriers and three facilitators: 1. Difficulties in accessing evidence (six studies) 2. Challenges in understanding the evidence (three studies) 3. Insufficient resources (six studies) 4. Knowledge sharing and ease of access (six studies) 5. Professional advisors and networks (three studies) 6. A broader definition of what counts as credible evidence and better standardisation of reporting (three studies).

barriers and facilitators organised - seems to miss psychological factors?

#### URL

3. giving-evidence.com giving-evidence.com
1. hey run conjoint analysis: in which customers are offered goods with various combinations of characteristics and price – maybe a pink car with a stereo for £1,000, a pink car without a stereo for £800, a blue car for £1,100 and a blue car without a stereo for £950 – to identify how much customers value each characteristic.

But these are usually (always) hypothetical choices, I believe.

2. et me tell you a story. Once upon a time, researcher Dean Karlan was investigating microloans to poor people in South Africa, and what encourages people to take them. He sent people flyers with various designs and offering loans at various rates and sizes. It turns out that giving consumers only one choice of loan size, rather than four, increased their take-up of loans as much as if the lender had reduced the interest rate by about 20 percent. And if the flyer features a picture of a woman, people will pay more for their loan – demand was as high as if the lender had reduced the interest rate by about a third. Nobody would say in a survey or interview that they would pay more if a flyer has a lady on it. But they do. Similarly, Nobel Laureate Daniel Kahneman reports that, empirically, people are more likely to be believe a statement if it is written in red than in green. But nobody would say that in a survey, not least because we don’t know it about ourselves.

on self-reported motivations

#### URL

4. towardsdatascience.com towardsdatascience.com
1. do(cluster_summary = summary(.))

do was old dplyr syntax, replaced by something more consistent but more verbose

2. gives us the best segmentation possible.

that's a bit strong

3. Just like K-means and hierarchical algorithms go hand-in-hand with Euclidean distance, the Partitioning Around Medoids (PAM) algorithm goes along with the Gower distance.

why can't I do hierarchical with Gower distance?

4. The silhouette width is one of the very popular choices when it comes to selecting the optimal number of clusters. It measures the similarity of each point to its cluster, and compares that to the similarity of the point with the closest neighboring cluster. This metric ranges between -1 to 1, where a higher value implies better similarity of the points to their clusters.

This is under- explained.

Silhouette width of each obs: Scaled measure of dissimilarity from (nearest) neighbor cluster relative to dissimilarity from own cluster.

5. library(cluster)gower_df <- daisy(german_credit_clean, metric = "gower" , type = list(logratio = 2))

Code needs a line

  mutate_if(is.character, as.factor)


To avoid an error

6. We find that the variable amount needs a log transformation due to the positive skew in its distribution.

just by visual inspection?

the others DON'T all seem normally distributed to me

7. e details about the mathematics of Gower distance are quite complicated and left out for another article.

I want to know

8. Clustering datasets having both numerical and categorical variables

discusses the vignette I used before more completely

#### URL

5. www.datanovia.com www.datanovia.com
1. For each observation iii, calculate the average dissimilarity aiaia_i between iii and all other points of the cluster to which i belongs. For all other clusters CCC, to which i does not belong, calculate the average dissimilarity d(i,C)d(i,C)d(i, C) of iii to all observations of C. The smallest of these d(i,C)d(i,C)d(i,C) is defined as bi=minCd(i,C)bi=minCd(i,C)b_i= \min_C d(i,C). The value of bibib_i can be seen as the dissimilarity between iii and its “neighbor” cluster, i.e., the nearest one to which it does not belong. Finally the silhouette width of the observation iii is defined by the formula: Si=(bi−ai)/max(ai,bi)Si=(bi−ai)/max(ai,bi)S_i = (b_i - a_i)/max(a_i, b_i).

Silhouette width of each obs: Scaled measure of dissimilarity from (nearest) neighbor cluster relative to dissimilarity from own cluster.

2. Average silhouette method

this is not really an explanation!

3. The total WSS measures the compactness of the clustering and we want it to be as small as possible.

as small as possible (within sample) for a given number of clusters

4. To avoid distortions caused by excessive outliers, it’s possible to use PAM algorithm, which is less sensitive to outliers.

another solution to outliers?

5. Next, the wss (within sum of square) is drawn according to the number of clusters. The location of a bend (knee) in the plot is generally considered as an indicator of the appropriate number of clusters.

need more explanation here. What is the value of this "within sum of square" and why does a 'bend' lead to the appropriate number

6. K-means algorithm can be summarized as follow: Specify the number of clusters (K) to be created (by the analyst) Select randomly k objects from the dataset as the initial cluster centers or means Assigns each observation to their closest centroid, based on the Euclidean distance between the object and the centroid For each of the k clusters update the cluster centroid by calculating the new mean values of all the data points in the cluster. The centoid of a Kth cluster is a vector of length p containing the means of all variables for the observations in the kth cluster; p is the number of variables. Iteratively minimize the total within sum of square. That is, iterate steps 3 and 4 until the cluster assignments stop changing or the maximum number of iterations is reached. By default, the R software uses 10 as the default value for the maximum number of iterations.

the implicit claim is that this 'mean-finding' procedure will minimise the sum of squared distances

7. to use correlation distance, the data are input as z-scores.

normalization to weigh each dimension the same

#### URL

6. en.wikipedia.org en.wikipedia.org
1. A successful evaluation of discriminant validity shows that a test of a concept is not highly correlated with other tests designed to measure theoretically different concepts.

But what if the traits you are trying to measure are actually correlated in the real world?

#### URL

7. en.wikipedia.org en.wikipedia.org
1. The remaining term, 1 / (1 − Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector Xj is orthogonal to each column of the design matrix for the regression of Xj on the other covariates. By contrast, the VIF is greater than 1 when the vector Xj is not orthogonal to all columns of the design matrix for the regression of Xj on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables

VIF interpretation

2. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expressed as:[3][4] var ^ ( β ^ j ) = s 2 ( n − 1 ) var ^ ( X j ) ⋅ 1 1 − R j 2 , {\displaystyle {\widehat {\operatorname {var} }}({\hat {\beta }}_{j})={\frac {s^{2}}{(n-1){\widehat {\operatorname {var} }}(X_{j})}}\cdot {\frac {1}{1-R_{j}^{2}}},} where Rj2 is the multiple R2 for the regression of Xj on the other covariates (a regression that does not involve the response variable Y). This identity separates the influences of several distinct factors on the variance of the coefficient estimate: s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates n: greater sample size results in proportionately less variance in the coefficient estimates var ^ ( X j ) {\displaystyle {\widehat {\operatorname {var} }}(X_{j})} : greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate The remaining term, 1 / (1 − Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates

a useful decomposition of the variance of the estimated coefficient

#### URL

8. danielmiessler.com danielmiessler.com
1. Summary: Algorithms to Live By

these annotations look like a great resource

#### URL

9. maxkasy.github.io maxkasy.github.io
1. When treatment assign-ment takes place in waves, it is natural to adapt Thompson sampling by assigning a non-random numberpdtNtof observations in wavetto treatmentd, in order to reduce ran-domness. The remainder of observations are assigned randomly so that expected sharesremain equal topdt.

not sure what this means

#### URL

10. en.wikipedia.org en.wikipedia.org
1. Q = 12 n k ( k + 1 ) ∑ j = 1 k ( r ¯ ⋅ j − k + 1 2 ) 2 {\displaystyle Q={\frac {12n}{k(k+1)}}\sum _{j=1}^{k}\left({\bar {r}}_{\cdot j}-{\frac {k+1}{2}}\right)^{2}} . Note

Q is something that will increase the more certain wine tends to be ranked systematically lower or higher than average

2. is the rank of x i j {\displaystyle x_{ij}}

Just rank the 'scores' of the wines within each rater

3. Find the values r ¯ ⋅ j = 1 n ∑ i = 1 n r i j {\displaystyle {\bar {r}}_{\cdot j}={\frac {1}{n}}\sum _{i=1}^{n}{r_{ij}}}

average rank of wine j across all raters

#### URL

11. Jan 2021
12. egap.org egap.org
1. For some reason I'm having trouble commenting on particular parts of this page with hypothesis

#### URL

13. daaronr.github.io daaronr.github.io
1. Definitions

@Jasonschukraft wrote:

Not sure where to put this comment, but how are you thinking about uncertainty about effectiveness? There's a small pool of donors who deny that GiveWell has identified the most effective global poverty/health charities because (e.g.) GiveWell is too focused on "randomista" interventions and doesn't give enough weight to "systematic" interventions.

2. Individual donors, governments and firms demonstrate substantial generosity (e.g., UK charity represents 0.5-1% of GDP, US charity around 2% of GDP).

Things to emphasize, from Jason Shukraft conversation.

Do the ‘masses of donors’ matter, or only the multimillionaire response? The average person … do small donations add up Also, knowing more about how average people to respond to analytical information (in an other regarding /social context) will inform how to influence good LT decision-making. (edited) 4:05 how to get USDA to care about animals/WAW… government to care about LT

#### URL

14. daaronr.github.io daaronr.github.io
1. how people react to the presentation of charity-effectiveness information.

@JasonSchukraft wrote:

Maybe. I suppose it depends on our goals. Do we want people to give to top charities for the right reason (i.e., because those charities are effective) or do we just want people to give to top charities, simpliciter? If the latter, then maybe it doesn't matter how people react to effectiveness information; we should just go with whatever marketing strategy maximizes donations.

#### URL

15. Dec 2020
16. daaronr.github.io daaronr.github.io
1. Beem101: Project, discussion of research

I was asked about the 'structure' of the project. This depends on the option, on your topic choice, and on how you wish to pursue it. Nonetheless, a rough structure might look like the following:

Across the topics (more or less... it depends on the project option and topic)

1. Introduce the topic, model, question, overview of what you are going to do, and why this is relevant and interesting (some combination of this)
• The Economic theory/theories and model(s) presented

• with reference to academic authors (papers textbooks)

• using formal (maths) modeling, giving at least one simple but formal presentation, and explaining it clearly and in your own voice (remember to explain what all variables mean),

• considering the assumptions and simplifications of the model, the 'Economic tool/fields considered' (e.g., optimisation, equilibrium)

• Sensitivity of the 'predictions' to the assumptions

• The justification for these assumptions

• Relationship between this model and your (applied) topic or focus... are the assumptions relevant, what are the 'predictions' etc.

1. The application or real world example:
• Explain it in practical terms and what the 'issues and questions are' (possibly engaging the previous literature a bit, but not too much)
• describe and express it formally
• relate it to the model/theory and apply the model theory to your real world example

• Try to 'model it' and derive 'results' or predictions, continually justifying the application of the model to the example

1. Presenting and assessing the insights from the model for the application and vice/versa
• considering the relevance and sensitivity
• what alternative models might be applied, how might it be adjusted
• Discuss 'what modeling and theory achieved or did not achieve here'

Note that "2" could come before or after "3" ... you can present the application first, or the model first... (or there might even be a way to go between the two, presenting one part of each)

#### URL

17. Oct 2020
18. globalprioritiesinstitute.org globalprioritiesinstitute.org
1. pure’ altruism or ‘warm glow’ altruism (Andreoni 1990; Ashraf and Bandiera 2017)

This classification is often misunderstood and misused. The Andreoni 'Warm Glow' paper was meant to consider a fairly simple general question about giving overall, not to unpick psychological motivations.

2. The Global Priorities Institute’s vision and mission

#### URL

19. en.wikipedia.org en.wikipedia.org
1. Formula The Y-intercept of the SML is equal to the risk-free interest rate. The slope of the SML is equal to the market risk premium and reflects the risk return tradeoff at a given time: S M L : E ( R i ) = R f + β i [ E ( R M ) − R f ] {\displaystyle \mathrm {SML} :E(R_{i})=R_{f}+\beta _{i}[E(R_{M})-R_{f}]\,} where: E(Ri) is an expected return on security E(RM) is an expected return on market portfolio M β is a nondiversifiable or systematic risk RM is a market rate of return Rf is a risk-free rate

The key equation ... specifying risk vs return

2. The Y-intercept of the SML is equal to the risk-free interest rate. The slope of the SML is equal to the market risk premium and reflects the risk return tradeoff at a given time: S M L : E ( R i ) = R f + β i [ E ( R M ) − R f ] {\displaystyle \mathrm {SML} :E(R_{i})=R_{f}+\beta _{i}[E(R_{M})-R_{f}]\,} where: E(Ri) is an expected return on security E(RM) is an expected return on market portfolio M β is a nondiversifiable or systematic risk RM is a market rate of return Rf is a risk-free rate

This is one statement of the key relationship.

The point is that the market will have a single tradeoff between unavoidable (nondiversifiable) risk and return.

Asset's returns must reflect this, according to the theory. Their prices will be bid up (or down), until this is the case ... the 'arbitrage' process.

Why? Because (assuming borrowing/lending at a risk free rate) *any investor can achieve a particular return for a given risk level simply by buying the 'diversified market basket' and leveraging this (for more risk) or investing the remainder in the risk free-asseet (for less risk). (And she can do no better than this.)

3. This abnormal extra return above the market's return at a given level of risk is what is called the alpha.

this is why you here the stock-touts bragging about their 'alpha'

#### URL

20. en.wikipedia.org en.wikipedia.org
1. Capital asset pricing model

2. quantity beta (β)

You hear about this 'beta' all the time as the measure of 'the correlation of the risk of an asset with the representative market basket'...

but confusingly, $$\beta$$ is used to represent the slope of the expected return of an asset as this risk increases.

3. systematic risk (beta) t

The concept of "systematic risk" is crucial in order to understand the CAPM. This relates to the risk of an 'optimally diversified portfolio'

#### URL

21. en.wikipedia.org en.wikipedia.org
1. If the fraction q {\displaystyle q} of a one-unit (e.g. one-million-dollar) portfolio is placed in asset X and the fraction 1 − q {\displaystyle 1-q} is placed in Y, the stochastic portfolio return is q x + ( 1 − q ) y {\displaystyle qx+(1-q)y} . If x {\displaystyle x} and y {\displaystyle y} are uncorrelated, the variance of portfolio return is var ( q x + ( 1 − q ) y ) = q 2 σ x 2 + ( 1 − q ) 2 σ y 2 {\displaystyle {\text{var}}(qx+(1-q)y)=q^{2}\sigma _{x}^{2}+(1-q)^{2}\sigma _{y}^{2}} . The variance-minimizing value of q {\displaystyle q} is q = σ y 2 / [ σ x 2 + σ y 2 ] {\displaystyle q=\sigma _{y}^{2}/[\sigma _{x}^{2}+\sigma _{y}^{2}]} , which is strictly between 0 {\displaystyle 0} and 1 {\displaystyle 1} . Using this value of q {\displaystyle q} in the expression for the variance of portfolio return gives the latter as σ x 2 σ y 2 / [ σ x 2 + σ y 2 ] {\displaystyle \sigma _{x}^{2}\sigma _{y}^{2}/[\sigma _{x}^{2}+\sigma _{y}^{2}]} , which is less than what it would be at either of the undiversified values q = 1 {\displaystyle q=1} and q = 0 {\displaystyle q=0} (which respectively give portfolio return variance of σ x 2 {\displaystyle \sigma _{x}^{2}} and σ y 2 {\displaystyle \sigma _{y}^{2}} ). Note that the favorable effect of diversification on portfolio variance would be enhanced if x {\displaystyle x} and y {\displaystyle y} were negatively correlated but diminished (though not eliminated) if they were positively correlated.

Key building block formulae.

• Start with 'what happens to the variance when we combine two assets (uncorrelated with same expected return)'

• What are the variance minimizing shares and what is the resulting variance of the portfolio.

2. Similarly, a 1985 book reported that most value from diversification comes from the first 15 or 20 different stocks in a portfolio.[6]

the conventional wisdom is that there are sharply diminishing returns to this diversification

#### URL

22. bookdown.org bookdown.org
1. d(p)=(209000-130p)

a simple demand function ('price-response function')

2. CLV Formula

#### URL

23. daaronr.github.io daaronr.github.io
1. “Sue’s mother” RaRaR_a “Sue’s lecturer in the UK” →→\rightarrow false (so it’s not ‘transitive’)

I think this is where Andrea meant to ask her question:

I wanted to ask how is this a false statement? I want to clarify. Is it that, she is a mother and and this does not relate with her being a lecturer in the UK? From my understanding the theory of transitive means that there is consistency, hence from the first statement to the last it would make sense…

2. intend

I have a video. Need to add it!

#### URL

24. daaronr.github.io daaronr.github.io
1. (Highly optional): Properties of binary relations - O-R problem 1a.

I went over this in the 16 October Q&A. Available to Exeter students HERE: https://web.microsoftstream.com/video/c2e218a8-0632-4d86-8ad2-d0ab7b70ebfb

#### URL

25. daaronr.github.io daaronr.github.io
1. Students

A household chooses how to invest ... to lay aside money for future consumption... which asset to buy To store this value and hopefully get “high payoffs” with little risk

#### URL

26. Local file Local file
1. We say that uu is ’a utility function for ≿\succsim.

Does "u is a utility function for $$\succsim$$" mean that the utility function 'represents' $$\succsim$$/

#### Annotators

27. daaronr.github.io daaronr.github.io
1. Differentiating this wrt III yields Engel aggregation:

TODO: make video of this

#### URL

28. Sep 2020
29. github.com github.com
1. direct

what is meant by 'direct?'

#### URL

30. rtcharity.org rtcharity.org
1. Past Projects

These are not all 'past'; the survey continues

#### URL

31. Aug 2020
32. forum.effectivealtruism.org forum.effectivealtruism.org
1. That's because cause prioritization research is extremely difficult, not because no one has thought to do this.

Yeah, I thought the same

2. 4. It is difficult to find cause neutral funding.I think funders like to choose their cause and stick with it so there is a lack of cause neutral funding.

A good point!

3. Growth and the case against randomista development,

I would say this one raised a lot of questions but didn't provide definitive answers

4. me that when reading the GPI research agenda, the economics parts read like it was written by philosophers.

I would agree with this

5. (Also, I have never worked in academia so there may be theories of change in the academic space that others could identify.)

There are some explicit 'Impact targets' in the REF, and pots of ESRC funding for 'impact activities'.

In general I don't think we believe that our 'publications' will themselves drive change. It's more like publications $$\rightarrow$$ status $$\rightarrow$$ influence policymakers

6. But for a new organisation to solely focus on doing the research that they believed would be most useful for improving the world it is unclear what the theory of change would be.

I'm not quote sure how this is differentiated from 'for a big funder'

7. I think that people are hesitant to do something new if they think it is being done, and funders want to know why the new thing is different so the abundance of organisations that used to do cause prioritisation research or do research that is a subcategory of cause prioritisation research limits other organisations from starting up.

Very good point. I think this happens in a lot of spheres.

8. Theoretical cause selection beyond speculation. Evidence of how to reason well despite uncertainty and more comparisons of different causes.

I also think this may have run into some fundamental obstacles.

9. more consideration of second order effect

super hard to measure

10. Let me give just one example, if you look at best practice in risk assessment methodologies[5] it looks very different from the naive expected value calculations used in EA

I agree somewhat, but I'm not sure if the 'risk-assesment methodologies' are easily communicated, nor if they apply to the EA concerns here.

11. theorists

here you are equating 'theorists' with long-termists

12. e. From my point of view, I could save a human life for ~£3000. I don’t want to let kids die needlessly if I can stop it. I personally think that the future is really important but before I drop the ball on all the things I know will have an impact it would be nice to have:

Reasonable statement of 'risk-aversion over the impact that i have'

13. (There could be experimental hits based giving.)

what does this mean?

14. Now let’s get a bit more complicated and do some more research and find other interventions and consider long run effects and so on”. There could be research looking for strong empirical evidence into:the second order or long run effects of existing interventions.how to drive economic growth, policy change, structural changes, and so forth.

These are just extremely difficult to do/learn about. Economists, political scientists, and policy analysts have been debating these for centuries. I'm not sure there are huge easy wins here.

15. Looking around it feels a like there is a split down the middle of the EA community:[4] On the one hand you have the empiricals: those who believe that doing good is difficult, common sense leads you astray and to create change we need hard data, ideally at least a few RCTs.On the other side are the theorists: those who believe you just need to think really hard and to choose a cause we need expected value calculations and it matters not if calculations are highly uncertain if the numbers tend to infinity.Personally I find myself somewhat drawn to the uncharted middle ground.

I agree that much of the most valuable work doesn't fall into either camp

16. Post community building I moved back into policy and most recently have found myself in the policy space, building support for future generations in the UK Parliament. Not research. Not waiting. But creating change.

This sounds a little self-aggrandizing. I don't think it was meant in such a way, though

17. The case of the missing cause prioritisation research

Putting in some Hypothes.is comments. Curious if others like this tool.

18. We theoretically expect and empirically observe impact to be “heavy tailed” with some causes being orders of magnitude more impactful

What are these 'theoretical' reasons we should expect this? Remind me.

#### URL

33. daaronr.github.io daaronr.github.io
1. Students: please propose some of these as a Hypothes.is comment HERE.

#### URL

34. daaronr.github.io daaronr.github.io
1. well

What do you mean "Wel;"

$$x^2=4$$

2. How individuals interact with one another, and the consequences of this (Game theory and mechanism design/agency problems)

What does this mean? Does it mean $$x^2=4$$

#### URL

35. egap.org egap.org
1. sometimes put together as measure like 'd' of 'effect relative to noise.... effect size/SD

#### URL

36. Jul 2020
37. daaronr.github.io daaronr.github.io
1. This relies heavily on:

also raw html code

#### URL

38. Jun 2020
39. rethinkpriorities.freshteam.com rethinkpriorities.freshteam.com
1. We’re backed by Open Philanthropy, Effective Altruism Funds, and viewers like you.

The funders

#### URL

40. bookdown.org bookdown.org
1. In typical meta-analyses, we do not have the individual data for each participant available, but only the aggregated effects, which is why we have to perform meta-regressions with predictors on a study level

But in principle we could do more if we had the raw data? This would then be a standard regression with an interaction and a study level 'random effect', I guess.

#### URL

41. bookdown.org bookdown.org
1. Same is the case once we detect statistical heterogeneity in our fixed-effect-model meta-analysis, as indicated by

I think empirically I-sq will always exceed 0. It's a matter of degree.

#### URL

42. handbook-5-1.cochrane.org handbook-5-1.cochrane.org
1. A useful statistic for quantifying inconsistency is , where Q is the chi-squared statistic and df is its degrees of freedom (Higgins 2002, Higgins 2003). This describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

I-sq measure of heterogeneity

#### URL

43. May 2020
44. www.openbookpublishers.com www.openbookpublishers.com
1. MODELS IN MICROECONOMIC THEORY

Commenting as a placeholder. Hope to use this in teaching soon.

#### URL

45. daaronr.github.io daaronr.github.io
1. wasting

test comment -- I wouldn't say 'wasting'

#### URL

46. bookdown.org bookdown.org
1. We can use the ecdf function to implement the ECDF in R, and then check the probability of our pooled effect being smaller than 0.30. The code looks like this.

should put this first and the plot afterwards

2. We see that the posterior distributions follow a unimodal, and roughly normal distribution, peaking around the values for μμ\mu and ττ\tau we saw in the output.

Consider: why are the peaks not exactly these values? Mean versus mode, I guess.

3. By using the ranef function, we can also extract the estimated deviation of each study’s “true” effect size from the pooled effect: ranef(m.brm) ## $Author ## , , Intercept ## ## Estimate Est.Error Q2.5 Q97.5 ## Call et al. 0.07181028 these are measures of deviations. But they don't exactly equal the difference between the input effect size and the estimated pooled effect size. I assume that somewhere this estimates a true effect for each study which 'averages towards the mean' following some criteria. 4. 0.09 Is this like a measure of the standard deviation of the estimated intercept? 5. Please be aware that Bayesian methods are much more computationally intensive compared to the standard meta-analytic techniques we covered before; it may therefore take a few minutes until the sampling is completed. I found it was the compiling of the C++ that took a bit of time 6. m.brm <- brm(TE|se(seTE) ~ 1 + (1|Author), data = ThirdWave, prior = priors, iter = 4000) Here r asks me to install tools and opens this link: https://www.cnet.com/how-to/install-command-line-developer-tools-in-os-x/ But I don't know which tools I need to install 7. In this example, I will use my ThirdWave dataset, which contains data of a real-world meta-analysis investigating the effects of “Third-Wave” psychotherapies in college students. The data is identical to the madata dataset we used in Chapter 4. Again, Bayesian analysis only seems to need the right summary stats, not the raw data #### Annotators #### URL 47. r4ds.had.co.nz r4ds.had.co.nz 1. using a sophisticated algorithm Is OLS such a sophisticated algorithm? #### Annotators #### URL 48. adv-r.hadley.nz adv-r.hadley.nz 1. call2() is often convenient to program with, why? 2. lobstr::ast(f1(f2(a, b), f3(1, f4(2)))) I'm having trouble seeing the point of this. 3. f <- expr(f(x = 1, y = 2)) # Add a new argument f$z <- 3 f #> f(x = 1, y = 2, z = 3)

You can 'add an argument' to an expression

4. function specifically designed to capture user input in a function argument: enexpr()

I think I need a more concrete example here

5. expr() lets you capture code that you’ve typed

but what do you do with it?

#### URL

1. Note that when you attach another package with library(), the parent environment of the global environment changes:

Installed packages are 'between' the global and base environments. But when you create a new environment with the env command it is 'after' (a child of) the global environment?

2. Unlike lists, setting an element to NULL does not remove it, because sometimes you want a name that refers to NULL. Instead, use env_unbind():

setting a list element to null removes it

3. But you can’t use [[ with numeric indices, and you can’t use [:

no 'element number'

4. Only one environment doesn’t have a parent: the empty environment.

poor guy

5. The current environment, or current_env() is the environment in which code is currently executing. When you’re experimenting interactively, that’s usually the global environment, or global_env(). The global environment is sometimes called your “workspace”, as it’s where all interactive (i.e. outside of a function) computation takes place.

this is super important

env print to see parent and 'bindings; of environment

7. e1$d <- e1 referring to or setting a list element with "$" ... it can also contain itself. mind blower

#### URL

Is this book dynamically updated?

#### URL

51. eml.berkeley.edu eml.berkeley.edu
1. Strong evidence for the perils of underpowered practive

#### URL

52. www.replicationmarkets.com www.replicationmarkets.com
1. Replication is testing the same claims using data that was not used in the original study. That required some changes from us. Starting in Round 6, Replication Markets will no longer distinguish between “data replication” and “direct replication.”

But what if it is impossible to find data 'not used in the original study' that is still a direct test of the claims?

#### URL

53. bookdown.org bookdown.org
1. t has been argued that a good approach is to use weakly informative priors (Williams, Rast, and Bürkner 2018). Weaky informative priors can be contrasted with non-informative priors.

!

2. integrate prior knowledge and assumptions when calculating meta-analyses.

including uncertainty over methodological validity?

#### URL

54. bookdown.org bookdown.org
1. It can either be stored as the raw data (including the Mean, N, and SD of every study arm) Or it only contains the calculated effect sizes and the standard error (SE).

note that this process does not 'dig in' to the raw data, it just needs the summary statistics

#### URL

55. bookdown.org bookdown.org
1. meta and metafor package which do most of the heavy lifting, there are still some aspects of meta-analyses in the biomedical field and psychology which we consider important, but are not easy to do in R currently, particularly if you do not have a programming or statistics background. To fill this gap, we developed the dmetar package, which serves as the companion R package for this guide. The dmetar package has its own documentation, which can be found here. Functions of the dmetar package provide additional functionality for the meta and metafor packages (and a few other, more advanced packages), w

dmetar package

#### URL

56. Apr 2020
57. cran.r-project.org cran.r-project.org
1. set_variable_labels(s1 = "Sex", s2 = "Yes or No?")

2. Adding variable labels using pipe

#### URL

58. bookdown.org bookdown.org
1. preview_chapter()

when I try this I get

Error in files2[[format]] :
attempt to select less than one element in get1index


However, I'm also not able to use the knit function, only the 'build' function

#### URL

59. Mar 2020
1. But if you end up with a very long series of chained if statements, you should consider rewriting. One useful technique is the switch() function. It allows you to evaluate selected code based on position or name. #> function(x, y, op) { #> switch(op, #> plus = x + y, #> minus = x - y, #> times = x * y, #> divide = x / y, #> stop("Unknown op!") #> ) #> }

switch is great!

#### URL

61. bookdown.org bookdown.org
1. The second type of tutorial provides much richer feedback and assessment, but also requires considerably more effort to author. If you are primarily interested in this sort of tutorial, there are many features in learnr to support it, including exercise hints and solutions, automated exercise checkers, and multiple choice quizzes with custom feedback.

full-blown course/learning materials

2. There are two main types of tutorial documents: Tutorials that are mostly narrative and/or video content, and also include some runnable code chunks. These documents are very similar to package vignettes in that their principal goal is communicating concepts. The interactive tutorial features are then used to allow further experimentation by the reader. Tutorials that provide a structured learning experience with multiple exercises, quiz questions, and tailored feedback. The first type of tutorial is much easier to author while still being very useful. These documents will typically add exercise = TRUE to selected code chunks, and also set exercise.eval = TRUE so the chunk output is visible by default. The reader can simply look at the R code and move on, or play with it to reinforce their understanding.

the easier kind of tutorial... just content with some code chunks (some pre-populated with code) the user can play with

#### URL

62. bookdown.org bookdown.org
1. button “Run Document” in RStudio, or call the function rmarkdown::run() on this Rmd file

Hitting the button worked for me; the script did not

#### URL

63. www.sciencedirect.com www.sciencedirect.com
1. image conscience donors

they meant 'image-conscious'

#### URL

64. www.nytimes.com www.nytimes.com
1. First, many health experts, including the surgeon general of the United States, told the public simultaneously that masks weren’t necessary for protecting the general public and that health care workers needed the dwindling supply. This contradiction confuses an ordinary listener. How do these masks magically protect the wearers only and only if they work in a particular field?

exactly what I was thinking

#### URL

65. www.the-brights.net www.the-brights.net
1. These results arein line with predictions, such that in those cases in which aconsequentialist judgment does not clearly violate fairness-basedprinciples about respecting others and not treating them as meremeans, people do not infer that the agent is necessarily an untrust-worthy social partner

but isn't it still a consequentialist judgement?!

2. We reasoned that if deontological agents are preferred overconsequentialist agents because they are perceived as more com-mitted to social cooperation, such preferences should be lessenedif consequentialist agents reported their judgments as being verydifficult to make, indicating some level of commitment to coop-eration (Critcher, Inbar, & Pizarro, 2013). From the process dis-sociation perspective (Conway & Gawronski, 2013), a person whoreports that it is easy to make a characteristically consequentialistjudgment can be interpreted as being high in consequentialism

I'm not sure I understand or like this approach. Couldn't it just be seen as merely a stronger consequentialism if they had no doubts? And is it even a meaningful distinction ... can I like the 'presence of cold' versus the 'absence of heat'.

3. In contrast to the previous studies, for the switch dilemma,consequentialist agents were rated to be no less moral (Z0.73,p.47,d0.10) or trustworthy (Z1.87,p.06,d0.26)than deontological agents.

To me, this seems to weigh against their main claim. In the one case in which a majority favored the consequentialist choice, the consequentialists are not disfavored! They are really playing this down. Am I missing something?

4. . Despite thegeneral endorsement many people have that “ends do not justifymeans,” people do typically judge that sacrificing the one man bydiverting the train is less morally wrong than sacrificing the manby using his body to stop the train (Foot, 1967; Greene et al.,2001).

How is this 'despite'? It doesn't seem to be in contradiction.

5. The switch case differs from the footbridge case in two criticalways

But it is still in the domain of HARMING people (more versus fewer).

6. The only difference is thatAdam does not push the large man, but instead pushes a button thatopens a trapdoor that causes the large man to fall onto the tracks.

Meh. This difference hardly seems worth bothering with.

7. The amount of moneyparticipants transferred to the agent (from $0.00 to$0.30) was usedas an indicator of trustworthiness, as was how much money theybelieved they would receive back from the agent (0% to 100%)

Note that this is a very small stake. (And was it even perhaps hypothetical?)

8. . However, the data did not support a meresimilarity effect: Our results were robust to controlling for partic-ipants’ own moral judgments, such that participants who made adeontological judgment (the majority) strongly preferred a deon-tological agent, whereas participants who made a consequentialistjudgment (the minority) showed no preference between the two

But this is a lack of a result in the context of a critical underlying assumption. Yes, the results were 'robust', but could we really be statistically confident that this was not driving the outcome? How tight are the error bounds?

9. However, the central claims behind thisaccount—that people who express deontological moral intuitions areperceived as more trustworthy and favored as cooperation partners—has not been empirically investigated.

Here is where the authors claim their territory.

10. the typicaldeontological reason for why specific actions are wrong is that theyviolate duties to respect persons and honor social obligations—fea-tures that are crucial when selecting a social partner. An individualwho claims that stealing is always morally wrong and believes them-selves morally obligated to act in accordance with this duty seemsmuch less likely to steal from me than an individual who believes thatthe stealing is sometimes morally acceptable depending on the con-sequences. Actors who express characteristically deontological judg-ments may therefore be preferred to those expressing consequentialistjudgments because these judgments may be more reliable indicatorsof stable cooperative behavior.

Key point.. deontological ethics signals stable cooperative behavior

11. First, deontologists’ prohibition of certain acts or behaviors mayserve as a relevant cue for inferring trustworthiness, because theextent to which someone claims to follow rule or action-based judg-ments may be associated with the reliability of their moral behavior.One piece of preliminary evidence for this comes from a studyshowing that agents willing to punish third parties who violate fair-ness principles are trusted more, and actually are more trustworthy(Jordan, Hoffman, Bloom, & Rand, 2016).

But couldn't this punishment be seen as utilitarian... as it promotes the general social good?

12. One approach to explaining why moral intuitions often align withdeontology comes from mutualistic partner choice models of theevolution of morality. These models posit a cooperation market suchthat agents who can be relied upon to act in a mutually beneficial wayare more likely to be chosen as cooperation partners, thus increasingtheir own fitness

this is the key theoretical argument

13. intriguingly

14. nd recent theoretical work has demon-strated that “cooperating without looking”—that is, without consid-ering the costs and benefits of cooperation—is a subgame perfectequilibrium (Hoffman, Yoeli, & Nowak, 2015). Therefore, expressingcharacteristically deontological judgments could constitute a behaviorthat enhances individual fitness in a cooperation market because thesejudgments are seen as reliable indicators of a specific valued behav-ior—cooperation

Is this relevant to the idea that '(advocating) Effective giving is a bad signal'?

Does utilitarian decision-making in 'good space' contradict this?

I'm not convinced. An 'excuse not to do something' is not the same as a 'choice to be effective'.

15. Across 5 studies, we show that people who make characteristically deontological judgments arepreferred as social partners, perceived as more moral and trustworthy, and are trusted more in economicgames.

But this does NOT hold in the switching case/switching study

#### URL

66. citeseerx.ist.psu.edu citeseerx.ist.psu.edu
1. Table 3also suggests that conditional norm enforcement is more pronounced among the populationwith intermediate and high levels of education. This finding is consistent with the observationthat conditional cooperation is particularly robust in lab experiments with student subjectpools (see G ̈achter, 2007). The data further show that females tend to be more inclined tosanction, in particular deviations from the strong norms. In contrast, employed respondentsare less engaged in sanctioning. All other socioeconomic characteristics do not show a clear

demographic breakdown of survey responses ... evidence

2. Ina national survey conducted in Austria, respondents were confronted with eight different‘incorrect behaviors’, including tax evasion, drunk driving, fare dodging or skiving off work.Respondents were then asked how they would react if an acquaintance followed such behavior.The response categories cover positive reactions – like approval (Rege and Telle, 2004) – aswell as negative reactions like cooling down the contact or expressing disapprova

below... targeted to be nationally representative.

#### URL

67. Feb 2020
68. daaronr.github.io daaronr.github.io
1. A dissertation or final-year project allows you to explore your aptitude for, and interest in doing economic research

This should be a separate bullet point. This is big. If you are going to do postgraduate study it WILL involve research.

Aside from the academic track, much professional work involves research.

#### URL

69. www1.essex.ac.uk www1.essex.ac.uk
1. James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert. (2013) An introduction to statistical learning: with applications in R, New York: Springer. vol. Springer texts in statistics

This would seem to overlap the ML module ?

#### URL

70. www1.essex.ac.uk www1.essex.ac.uk
1. - construct factorial experiments in blocks;

Did they get into power calculation and design efficiency? This seems more general statistics and less experimetrics. OK, it doesn't say 'design'

#### URL

71. www1.essex.ac.uk www1.essex.ac.uk
1. Overleaf /LaTex

Not sure students need to know too much latex anymore… markdown/r-md is a lot simpler and using it with css and html bits is very flexible. (although it still helps to know how to code maths in Latex)

#### URL

72. declaredesign.org declaredesign.org
1. f you can avoid assigning subjects to treatments by cluster, you should.

Sometimes clustered assignment is preferable if mixing treatments in a cluster --> contaminated treatments (e.g., because participants communicate)

2. fit_simple <- lm(Y_simple ~ Z_simple, data=hec)

'regress' the outcome on the treatment. Yields ATE with even with heterogeneity if treatment is equiprobable.

3. This complication is typically addressed in one of two ways: “controlling for blocks” in a regression context, or inverse probability weights (IPW), in which units are weighted by the inverse of the probability that the unit is in the condition that it is in.

I don't think these are equivalent. I believe only the latter recovers the ATE under heterogeneity... but this is just my memory.

4. The gains from a blocked design can often be realized through covariate adjustment alone.

I believe Athey and Heckman come out strongly in favor of blocking instead of covariate adjustment.

5. Of course, such heterogeneity could be explored if complete random assignment had been used, but blocking on a covariate defends a researcher (somewhat) against claims of data dredging.

A preregistration plan can accomplish this without any cost.

6. In this simulation complete random assignment led to a -0.59% decrease in sampling variability. This decrease was obtained with a small design tweak that costs the researcher essentially nothing.

This is not visible in the html. You specified too few digits.

Also, the results would be more striking if you had a smaller data set.

7. with(hec, mean(Y1 - Y0))

ATE with heterogeneity?

8. # Reveal observed potential outcomes

He means 'the outcome observed given random assignment'

9. when deploying a survey experiment on a platform like Qualtrics, simple random assignment is the only possibility due to the inflexibility of the built-in random assignment tools.

That's not entirely true

10. Since you need to know N beforehand in order to use simple_ra(), it may seem like a useless function.

this is a confusing sentence

11. depending on the random assignment, a different number of subjects might be assigned to each group.

In large samples this won't usually matter much... but still worth avoiding, to make power as high as possible.

12. Y0 <- rnorm(n = N,mean = (2*as.numeric(Hair) + -4*as.numeric(Eye) + -6*as.numeric(Sex)), sd = 5)

linear heterogeneity of baseline and of TE

13. hec <- within(hec,{

why does he use 'within' rather than mutate?

#### URL

73. community.spotify.com community.spotify.com
1. Solution! Re: Export To Excel Mark as New Bookmark Subscribe Mute Subscribe to RSS Feed Permalink Print Email to a Friend Report Inappropriate Content slipstream42 Regular ‎2017-07-31 04:22 PM another csv export link it is quite nicehttps://rawgit.com/watsonbox/exportify/master/exportify.htmlcode on github  View solution in original post 31 Likes

works great

#### URL

74. www.vox.com www.vox.com
1. As you can see, having one fewer child still comes out looking like a solid way to reduce carbon emissions — but it’s absolutely nowhere near as effective as it first seemed. It no longer dwarfs the other options. On this model, instead of having one fewer kid, you can skip a couple of transatlantic flights and you’ll save the same amount of carbon. That seems like a way more manageable sacrifice if you’re a young person who longs to be a parent.

Even if I believed the highly optimistic predictions of very strong climate policy in the USA (which I don't), having one fewer child still reduces emissions each year more than twice as much as living car free or avoiding a trans-atlantic flight every year.

And they state it as "instead of having one fewer kid, you can skip a couple of transatlantic flights and you’ll save the same amount of carbon." ... but this requires each parent to forgo 2 transatlantic flights they would have taken every year for the rest of their life, if I understand correctly.

#### URL

1. commas <- function(...) stringr::str_c(..., collapse = ", ")

no braces needed for function on a single line

#### URL

1. it’s the same as the input!

because we want to modify columns in place

2. Compute the mean of every column in mtcars.
output <- vector("double", ncol(mtcars))  # 1. output
for (i in seq_along(mtcars)) {            # 2. sequence
output[[i]] <- mean(mtcars[[i]])      # 3. body
}


#### URL

77. www.fmassari.com www.fmassari.com
1. The rational expectation and thelearning-from-price literatures argue that equilibrium prices are accurate becausethey reveal and aggregate the information of all market participants. The MarketSelection Hypothesis,MSH, proposes instead that prices become accurate becausethey eventually reflect only the beliefs of the most accurate agent. The Wisdomof the Crowd argument,WOC, however suggests that market prices are accuratebecause individual, idiosyncratic errors are averaged out by the price formationmechanism

Three models (arguments for) drivers of market efficiency

#### URL

1. external fundraising page

What is meant by 'external'?

2. Fundraising Dashboard / Participant Center Visited When a person visits their fundraising dashboard or participant center

who is the 'person' visiting the dashboard here?

3. Fundraising Page Created / Registration Complete Upon completion of the last step of the registration flow that creates a fundraising page

by whom? which ones can be detected?

#### URL

79. crumplab.github.io crumplab.github.io
1. Contributing to the textbook Use Hypothes.is, an amazing tool for annotating the web. Go to Hypothes.is, and “get-started”

To nudge people slightly towards this, you can add to the index.Rmd:

    includes:


And in that header_include.html file, inclide


<script async defer src="https://hypothes.is/embed.js"></script>


I do this here in my Writing Economics book

#### URL

80. dplyr.tidyverse.org dplyr.tidyverse.org
1. head(as.data.frame(nasa))#> lat long month year cloudhigh cloudlow cloudmid ozone pressure #> 1 36.20000 -113.8 1 1995 26.0 7.5 34.5 304 835 #> 2 33.70435 -113.8 1 1995 20.0 11.5 32.5 304 940 #> 3 31.20870 -113.8 1 1995 16.0 16.5 26.0 298 960 #> 4 28.71304 -113.8 1 1995 13.0 20.5 14.5 276 990 #> 5 26.21739 -113.8 1 1995 7.5 26.0 10.5 274 1000 #> 6 23.72174 -113.8 1 1995 8.0 30.0 9.5 264 1000 #> surftemp temperature #> 1 272.7 272.1 #> 2 279.5 282.2 #> 3 284.7 285.2 #> 4 289.3 290.7 #> 5 292.2 292.7 #> 6 294.1 293.6

unrolling a tbl_cube into 2 dimensions (data.frame)

#### URL

81. www.r-bloggers.com www.r-bloggers.com
1. Now this can be simplified using the new {{}} syntax: summarise_groups <- function(dataframe, grouping_var, column_name){ dataframe %>% group_by({{grouping_var}}) %>% summarise({{column_name}} := mean({{column_name}}, na.rm = TRUE)) } Much easier and cleaner! You still have to use the := operator instead of = for the column name however. Also, from my understanding, if you want to modify the column names, for instance in this case return "mean_height" instead of height you have to keep using the enquo()–!! syntax.

curly curly syntax

#### URL

82. www.sciencedirect.com www.sciencedirect.com
1. (1) “How likely do you think it is that this hypothesis will be replicated (on a scale from 0% to 100%)?” (2) “How large do you think the standardized effect size (in terms of Cohen’s d) from the replication will be, relative to that in the original paper (on a scale from −50% to 200%)?”, and (3) “How well do you know this topic? (Not at all; Slightly; Moderately; Very well; Extremely well.)”

pre-market survey Many labs 2

2. 0.506 (0.532)

can we find any measures of dispersion here?

3. For the 12 studies with an original p < 0.005, 10 (83%) replicated. For the 12 studies with an original p > 0.005, only 1 (8%) replicated. Further work is needed to test if prediction markets outperform predictions based only on the initial p-value, to test if the market also aggregates other information important for reproducibility.

p values may capture all the information?

#### URL

83. daaronr.github.io daaronr.github.io
1. At least one of my un-named research co-authors will heartily agree with this statement.↩︎

Hi Dave!

#### URL

84. Jan 2020
85. wilsonmar.github.io wilsonmar.github.io
1. ps f

this doesn't run on my system. However ps -f seems to list processes started in the terminal and ps -ef lists all (?) processes

2. List previous command history: history

useful

#### URL

86. www.freecodecamp.org www.freecodecamp.org
1. It’s worth noting that first line of the script starts with #!. It is a special directive which Unix treats differently.

Term hash tag at top of bash scripts are NOT comments... they are important

#### URL

87. happygitwithr.com happygitwithr.com
1. Happy Git and GitHub for the useR

Oska: can you see this note?

#### URL

88. daaronr.github.io daaronr.github.io
1. 8 Writing, argumentation, presentation, and (Economic) logic: Being clear and making sense

List of words and phrases to avoid -- what are your biggest pet peeves in student writing?

#### URL

89. pubs.aeaweb.org pubs.aeaweb.org
1. Prediction in Policy

Relevance to my own project: can we predict who has the most to gain from admission to an HE institution. (But I'm limited in what I can report)

2. Suppose the algorithm chooses a tree that splits on education but not on age. Conditional on this tree, the estimated coefficients are consistent. But that does not imply that treatment effects do not also vary by age, as education may well covary with age; on other draws of the data, in fact, the same procedure could have chosen a tree that split on age instead

a caveat

3. hese heterogenous treatment effects can be used to assign treatments; Misra and Dubé (2016) illustrate this on the problem of price targeting, applying Bayesian regularized methods to a large-scale experiment where prices were randomly assigned

todo -- look into the implication for treatment assignment with heterogeneity

4. Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) take care of high-dimensional controls in treatment effect estimation by solving two simultaneous prediction problems, one in the outcome and one in the treatment equation.

this seems similar to my idea of regularizing on only a subset of the variables

5. In particular, a set of papers has already introduced regu-larization into the first stage in a high-dimensional setting, including the LASSO (Belloni, Chen, Chernozhukov, and Hansen 2012) and ridge regression (Carrasco 2012; Hansen and Kozbur 2014

worth referencing

6. These same techniques applied here result in split-sample instrumental variables (Angrist and Krueger 1995) and “jackknife” instrumental variables

some classical solutions to IV bias are akin to ML solutions

7. Understood this way, the finite-sample biases in instrumental variables are a consequence of overfitting.

traditional 'finite sample bias of IV' is really overfitting

8. Even when we are interested in a parameter β ˆ, the tool we use to recover that parameter may contain (often implicitly) a prediction component. Take the case of linear instrumental variables understood as a two-stage procedure: first regress x = γ′z + δ on the instrument z, then regress y = β′x + ε on the fitted values x ˆ. The first stage is typically handled as an estimation step. But this is effectively a prediction task: only the predictions x ˆ enter the second stage; the coefficients in the first stage are merely a means to these fitted values.

first stage of IV -- handled as an estimation problem, but really it's a prediction problem!

9. Prediction in the Service of Estimation

This is especially relevant to economists across the board, even the ML skeptics

10. New Data

The first application: constructing variables and meaning from high-dimensional data, especially outcome variables

• satellite images (of energy use, lights etc) --> economic activity
• cell phone data, Google street view to measure wealth
• extract similarity of firms from 10k reports
• even traditional data .. matching individuals in historical censuses
11. Zhao and Yu (2006) who establish asymptotic model-selection consistency for the LASSO. Besides assuming that the true model is “sparse”—only a few variables are relevant—they also require the “irrepresentable condition” between observables: loosely put, none of the irrelevant covariates can be even moderately related to the set of relevant ones.

Basically unrealistic for microeconomic applications imho

12. First, it encourages the choice of less complex, but wrong models. Even if the best model uses interactions of number of bathrooms with number of rooms, regularization may lead to a choice of a simpler (but worse) model that uses only number of fireplaces. Second, it can bring with it a cousin of omitted variable bias, where we are typically concerned with correlations between observed variables and unobserved ones. Here, when regular-ization excludes some variables, even a correlation between observed variables and other observed (but excluded) ones can create bias in the estimated coefficients.

Is this equally a problem for procedures that do not assum sparsity, such as the Ridge model?

13. 97the variables are correlated with each other (say the number of rooms of a house and its square-footage), then such variables are substitutes in predicting house prices. Similar predictions can be produced using very different variables. Which variables are actually chosen depends on the specific finite sample.

Lasso-chosen variables are unstable because of what we usually call 'multicollinearity.'<br> This presents a problem for making inferences from estimated coefficients.

14. Through its regularizer, LASSO produces a sparse prediction function, so that many coefficients are zero and are “not used”—in this example, we find that more than half the variables are unused in each run

This is true but they fail to mention that LASSO also shrinks the coefficients on variables that it keeps towards zero (relative to OLS). I think this is commonly misunderstood (from people I've spoken with).

15. One obvious problem that arises in making such inferences is the lack of stan-dard errors on the coefficients. Even when machine-learning predictors produce familiar output like linear functions, forming these standard errors can be more complicated than seems at first glance as they would have to account for the model selection itself. In fact, Leeb and Pötscher (2006, 2008) develop conditions under which it is impossible to obtain (uniformly) consistent estimates of the distribution of model parameters after data-driven selection.

This is a very serious limitation for Economics academic work.

16. First, econometrics can guide design choices, such as the number of folds or the function class.

How would Econometrics guide us in this?

17. These choices about how to represent the features will interact with the regularizer and function class: A linear model can reproduce the log base area per room from log base area and log room number easily, while a regression tree would require many splits to do so.

The choice of 'how to represent the features' is consequential ... it's not just 'throw it all in' (kitchen sink approach)

18. Ta b l e 2Some Machine Learning Algorithms

This is a very helpful table!

19. Picking the prediction func-tion then involves two steps: The first step is, conditional on a level of complexity, to pick the best in-sample loss-minimizing function.8 The second step is to estimate the optimal level of complexity using empirical tuning (as we saw in cross-validating the depth of the tree).

ML explained while standing on one leg.

20. egularization combines with the observability of predic-tion quality to allow us to fit flexible functional forms and still find generalizable structure.

But we can't really make statistical inferences about the structure, can we?

21. This procedure works because prediction quality is observable: both predic-tions y ˆ and outcomes y are observed. Contrast this with parameter estimation, where typically we must rely on assumptions about the data-generating process to ensure consistency.

I'm not clear what the implication they are making here is. Does it in some sense 'not work' with respect to parameter estimation?

22. In empirical tuning, we create an out-of-sample experiment inside the original sample.

remember that tuning is done within the training sample

23. Performance of Different Algorithms in Predicting House Values

Any reason they didn't try a Ridge or an Elastic net model here? My instinct is that these will beat LASSO for most Economic applications.

24. We consider 10,000 randomly selected owner-occupied units from the 2011 metropolitan sample of the American Housing Survey. In addition to the values of each unit, we also include 150 variables that contain information about the unit and its location, such as the number of rooms, the base area, and the census region within the United States. To compare different prediction tech-niques, we evaluate how well each approach predicts (log) unit value on a separate hold-out set of 41,808 units from the same sample. All details on the sample and our empirical exercise can be found in an online appendix available with this paper athttp://e-jep.org

Seems a useful example for trying/testing/benchmarking. But the link didn't work for me. Can anyone find it? Is it interactive? (This is why I think papers should be html and not pdfs...)

25. Making sense of complex data such as images and text often involves a prediction pre-processing step.

In using 'new kinds of data' in Economics we often need to do a 'classification step' first

26. The fundamental insight behind these breakthroughs is as much statis-tical as computational. Machine intelligence became possible once researchers stopped approaching intelligence tasks procedurally and began tackling them empirically.

I hadn't thought about how this unites the 'statistics to learn stuff' part of ML and the 'build a tool to do a task' part. Well-phrased.

27. Why not also use it to learn something about the “underlying model”: specifically, why not use it to make infer-ences about the underlying data-generating process?

(they give reasons why not)

28. Economic theory and content expertise play a crucial role in guiding where the algorithm looks for structure first. This is the sense in which “simply throw it all in” is an unreasonable way to understand or run these machine learning algo-rithms.

At least we (Economists) hope this is the case ... motivated reasoning?

29. available finite-sample guidance on its implementation—such as heuristics for the number of folds (usually five to ten) or the “one standard-error rule” for tuning the LASSO (Hastie, Tibshirani, and Friedman 2009)—has a more ad-hoc flavor.

It sounds like there are big unknowns... a lot is still 'rules of thumb'