237 Matching Annotations
  1. Jun 2020
    1. In typical meta-analyses, we do not have the individual data for each participant available, but only the aggregated effects, which is why we have to perform meta-regressions with predictors on a study level

      But in principle we could do more if we had the raw data? This would then be a standard regression with an interaction and a study level 'random effect', I guess.

    1. Same is the case once we detect statistical heterogeneity in our fixed-effect-model meta-analysis, as indicated by

      I think empirically I-sq will always exceed 0. It's a matter of degree.

    1. A useful statistic for quantifying inconsistency is , where Q is the chi-squared statistic and df is its degrees of freedom (Higgins 2002, Higgins 2003). This describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).  

      I-sq measure of heterogeneity

  2. May 2020
    1. MODELS IN MICROECONOMIC THEORY

      Commenting as a placeholder. Hope to use this in teaching soon.

    1. We can use the ecdf function to implement the ECDF in R, and then check the probability of our pooled effect being smaller than 0.30. The code looks like this.

      should put this first and the plot afterwards

    2. We see that the posterior distributions follow a unimodal, and roughly normal distribution, peaking around the values for μμ\mu and ττ\tau we saw in the output.

      Consider: why are the peaks not exactly these values? Mean versus mode, I guess.

    3. By using the ranef function, we can also extract the estimated deviation of each study’s “true” effect size from the pooled effect: ranef(m.brm) ## $Author ## , , Intercept ## ## Estimate Est.Error Q2.5 Q97.5 ## Call et al. 0.07181028

      these are measures of deviations. But they don't exactly equal the difference between the input effect size and the estimated pooled effect size. I assume that somewhere this estimates a true effect for each study which 'averages towards the mean' following some criteria.

    4. 0.09

      Is this like a measure of the standard deviation of the estimated intercept?

    5. Please be aware that Bayesian methods are much more computationally intensive compared to the standard meta-analytic techniques we covered before; it may therefore take a few minutes until the sampling is completed.

      I found it was the compiling of the C++ that took a bit of time

    6. m.brm <- brm(TE|se(seTE) ~ 1 + (1|Author), data = ThirdWave, prior = priors, iter = 4000)

      Here r asks me to install tools and opens this link: https://www.cnet.com/how-to/install-command-line-developer-tools-in-os-x/

      But I don't know which tools I need to install

    7. In this example, I will use my ThirdWave dataset, which contains data of a real-world meta-analysis investigating the effects of “Third-Wave” psychotherapies in college students. The data is identical to the madata dataset we used in Chapter 4.

      Again, Bayesian analysis only seems to need the right summary stats, not the raw data

    1. using a sophisticated algorithm

      Is OLS such a sophisticated algorithm?

    1. call2() is often convenient to program with,

      why?

    2. lobstr::ast(f1(f2(a, b), f3(1, f4(2))))

      I'm having trouble seeing the point of this.

    3. f <- expr(f(x = 1, y = 2)) # Add a new argument f$z <- 3 f #> f(x = 1, y = 2, z = 3)

      You can 'add an argument' to an expression

    4. function specifically designed to capture user input in a function argument: enexpr()

      I think I need a more concrete example here

    5. expr() lets you capture code that you’ve typed

      but what do you do with it?

    1. Note that when you attach another package with library(), the parent environment of the global environment changes:

      Installed packages are 'between' the global and base environments. But when you create a new environment with the env command it is 'after' (a child of) the global environment?

    2. Unlike lists, setting an element to NULL does not remove it, because sometimes you want a name that refers to NULL. Instead, use env_unbind():

      setting a list element to null removes it

    3. But you can’t use [[ with numeric indices, and you can’t use [:

      no 'element number'

    4. Only one environment doesn’t have a parent: the empty environment.

      poor guy

    5. The current environment, or current_env() is the environment in which code is currently executing. When you’re experimenting interactively, that’s usually the global environment, or global_env(). The global environment is sometimes called your “workspace”, as it’s where all interactive (i.e. outside of a function) computation takes place.

      this is super important

    6. env_print() which gives us a little more information:

      env print to see parent and 'bindings; of environment

    7. e1$d <- e1

      referring to or setting a list element with "$" ... it can also contain itself. mind blower

    1. Advanced R

      Is this book dynamically updated?

    1. Replication is testing the same claims using data that was not used in the original study. That required some changes from us. Starting in Round 6, Replication Markets will no longer distinguish between “data replication” and “direct replication.” 

      But what if it is impossible to find data 'not used in the original study' that is still a direct test of the claims?

    1. t has been argued that a good approach is to use weakly informative priors (Williams, Rast, and Bürkner 2018). Weaky informative priors can be contrasted with non-informative priors.

      !

    2. integrate prior knowledge and assumptions when calculating meta-analyses.

      including uncertainty over methodological validity?

    1. It can either be stored as the raw data (including the Mean, N, and SD of every study arm) Or it only contains the calculated effect sizes and the standard error (SE).

      note that this process does not 'dig in' to the raw data, it just needs the summary statistics

    1. meta and metafor package which do most of the heavy lifting, there are still some aspects of meta-analyses in the biomedical field and psychology which we consider important, but are not easy to do in R currently, particularly if you do not have a programming or statistics background. To fill this gap, we developed the dmetar package, which serves as the companion R package for this guide. The dmetar package has its own documentation, which can be found here. Functions of the dmetar package provide additional functionality for the meta and metafor packages (and a few other, more advanced packages), w

      dmetar package

  3. Apr 2020
    1. set_variable_labels(s1 = "Sex", s2 = "Yes or No?")

      Adding variable labels with pipe

    2. Adding variable labels using pipe

    1. preview_chapter()

      when I try this I get

      Error in files2[[format]] : 
        attempt to select less than one element in get1index
      

      However, I'm also not able to use the knit function, only the 'build' function

  4. Mar 2020
    1. But if you end up with a very long series of chained if statements, you should consider rewriting. One useful technique is the switch() function. It allows you to evaluate selected code based on position or name. #> function(x, y, op) { #> switch(op, #> plus = x + y, #> minus = x - y, #> times = x * y, #> divide = x / y, #> stop("Unknown op!") #> ) #> }

      switch is great!

    1. The second type of tutorial provides much richer feedback and assessment, but also requires considerably more effort to author. If you are primarily interested in this sort of tutorial, there are many features in learnr to support it, including exercise hints and solutions, automated exercise checkers, and multiple choice quizzes with custom feedback.

      full-blown course/learning materials

    2. There are two main types of tutorial documents: Tutorials that are mostly narrative and/or video content, and also include some runnable code chunks. These documents are very similar to package vignettes in that their principal goal is communicating concepts. The interactive tutorial features are then used to allow further experimentation by the reader. Tutorials that provide a structured learning experience with multiple exercises, quiz questions, and tailored feedback. The first type of tutorial is much easier to author while still being very useful. These documents will typically add exercise = TRUE to selected code chunks, and also set exercise.eval = TRUE so the chunk output is visible by default. The reader can simply look at the R code and move on, or play with it to reinforce their understanding.

      the easier kind of tutorial... just content with some code chunks (some pre-populated with code) the user can play with

    1. button “Run Document” in RStudio, or call the function rmarkdown::run() on this Rmd file

      Hitting the button worked for me; the script did not

    1. First, many health experts, including the surgeon general of the United States, told the public simultaneously that masks weren’t necessary for protecting the general public and that health care workers needed the dwindling supply. This contradiction confuses an ordinary listener. How do these masks magically protect the wearers only and only if they work in a particular field?

      exactly what I was thinking

    1. These results arein line with predictions, such that in those cases in which aconsequentialist judgment does not clearly violate fairness-basedprinciples about respecting others and not treating them as meremeans, people do not infer that the agent is necessarily an untrust-worthy social partner

      but isn't it still a consequentialist judgement?!

    2. We reasoned that if deontological agents are preferred overconsequentialist agents because they are perceived as more com-mitted to social cooperation, such preferences should be lessenedif consequentialist agents reported their judgments as being verydifficult to make, indicating some level of commitment to coop-eration (Critcher, Inbar, & Pizarro, 2013). From the process dis-sociation perspective (Conway & Gawronski, 2013), a person whoreports that it is easy to make a characteristically consequentialistjudgment can be interpreted as being high in consequentialism

      I'm not sure I understand or like this approach. Couldn't it just be seen as merely a stronger consequentialism if they had no doubts? And is it even a meaningful distinction ... can I like the 'presence of cold' versus the 'absence of heat'.

    3. In contrast to the previous studies, for the switch dilemma,consequentialist agents were rated to be no less moral (Z0.73,p.47,d0.10) or trustworthy (Z1.87,p.06,d0.26)than deontological agents.

      To me, this seems to weigh against their main claim. In the one case in which a majority favored the consequentialist choice, the consequentialists are not disfavored! They are really playing this down. Am I missing something?

    4. . Despite thegeneral endorsement many people have that “ends do not justifymeans,” people do typically judge that sacrificing the one man bydiverting the train is less morally wrong than sacrificing the manby using his body to stop the train (Foot, 1967; Greene et al.,2001).

      How is this 'despite'? It doesn't seem to be in contradiction.

    5. The switch case differs from the footbridge case in two criticalways

      But it is still in the domain of HARMING people (more versus fewer).

    6. The only difference is thatAdam does not push the large man, but instead pushes a button thatopens a trapdoor that causes the large man to fall onto the tracks.

      Meh. This difference hardly seems worth bothering with.

    7. The amount of moneyparticipants transferred to the agent (from $0.00 to $0.30) was usedas an indicator of trustworthiness, as was how much money theybelieved they would receive back from the agent (0% to 100%)

      Note that this is a very small stake. (And was it even perhaps hypothetical?)

    8. . However, the data did not support a meresimilarity effect: Our results were robust to controlling for partic-ipants’ own moral judgments, such that participants who made adeontological judgment (the majority) strongly preferred a deon-tological agent, whereas participants who made a consequentialistjudgment (the minority) showed no preference between the two

      But this is a lack of a result in the context of a critical underlying assumption. Yes, the results were 'robust', but could we really be statistically confident that this was not driving the outcome? How tight are the error bounds?

    9. However, the central claims behind thisaccount—that people who express deontological moral intuitions areperceived as more trustworthy and favored as cooperation partners—has not been empirically investigated.

      Here is where the authors claim their territory.

    10. the typicaldeontological reason for why specific actions are wrong is that theyviolate duties to respect persons and honor social obligations—fea-tures that are crucial when selecting a social partner. An individualwho claims that stealing is always morally wrong and believes them-selves morally obligated to act in accordance with this duty seemsmuch less likely to steal from me than an individual who believes thatthe stealing is sometimes morally acceptable depending on the con-sequences. Actors who express characteristically deontological judg-ments may therefore be preferred to those expressing consequentialistjudgments because these judgments may be more reliable indicatorsof stable cooperative behavior.

      Key point.. deontological ethics signals stable cooperative behavior

    11. First, deontologists’ prohibition of certain acts or behaviors mayserve as a relevant cue for inferring trustworthiness, because theextent to which someone claims to follow rule or action-based judg-ments may be associated with the reliability of their moral behavior.One piece of preliminary evidence for this comes from a studyshowing that agents willing to punish third parties who violate fair-ness principles are trusted more, and actually are more trustworthy(Jordan, Hoffman, Bloom, & Rand, 2016).

      But couldn't this punishment be seen as utilitarian... as it promotes the general social good?

    12. One approach to explaining why moral intuitions often align withdeontology comes from mutualistic partner choice models of theevolution of morality. These models posit a cooperation market suchthat agents who can be relied upon to act in a mutually beneficial wayare more likely to be chosen as cooperation partners, thus increasingtheir own fitness

      this is the key theoretical argument

    13. intriguingly

      let the reader decide whether it is intriguing, please.

    14. nd recent theoretical work has demon-strated that “cooperating without looking”—that is, without consid-ering the costs and benefits of cooperation—is a subgame perfectequilibrium (Hoffman, Yoeli, & Nowak, 2015). Therefore, expressingcharacteristically deontological judgments could constitute a behaviorthat enhances individual fitness in a cooperation market because thesejudgments are seen as reliable indicators of a specific valued behav-ior—cooperation

      Is this relevant to the idea that '(advocating) Effective giving is a bad signal'?

      Does utilitarian decision-making in 'good space' contradict this?

      I'm not convinced. An 'excuse not to do something' is not the same as a 'choice to be effective'.

    15. Across 5 studies, we show that people who make characteristically deontological judgments arepreferred as social partners, perceived as more moral and trustworthy, and are trusted more in economicgames.

      But this does NOT hold in the switching case/switching study

    1. Table 3also suggests that conditional norm enforcement is more pronounced among the populationwith intermediate and high levels of education. This finding is consistent with the observationthat conditional cooperation is particularly robust in lab experiments with student subjectpools (see G ̈achter, 2007). The data further show that females tend to be more inclined tosanction, in particular deviations from the strong norms. In contrast, employed respondentsare less engaged in sanctioning. All other socioeconomic characteristics do not show a clear

      demographic breakdown of survey responses ... evidence

    2. Ina national survey conducted in Austria, respondents were confronted with eight different‘incorrect behaviors’, including tax evasion, drunk driving, fare dodging or skiving off work.Respondents were then asked how they would react if an acquaintance followed such behavior.The response categories cover positive reactions – like approval (Rege and Telle, 2004) – aswell as negative reactions like cooling down the contact or expressing disapprova

      below... targeted to be nationally representative.

  5. Feb 2020
    1. A dissertation or final-year project allows you to explore your aptitude for, and interest in doing economic research

      This should be a separate bullet point. This is big. If you are going to do postgraduate study it WILL involve research.

      Aside from the academic track, much professional work involves research.

    1. James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert. (2013) An introduction to statistical learning: with applications in R, New York: Springer. vol. Springer texts in statistics

      This would seem to overlap the ML module ?

    1. - construct factorial experiments in blocks;

      Did they get into power calculation and design efficiency? This seems more general statistics and less experimetrics. OK, it doesn't say 'design'

    1. Overleaf /LaTex 

      Not sure students need to know too much latex anymore… markdown/r-md is a lot simpler and using it with css and html bits is very flexible. (although it still helps to know how to code maths in Latex)

    1. f you can avoid assigning subjects to treatments by cluster, you should.

      Sometimes clustered assignment is preferable if mixing treatments in a cluster --> contaminated treatments (e.g., because participants communicate)

    2. fit_simple <- lm(Y_simple ~ Z_simple, data=hec)

      'regress' the outcome on the treatment. Yields ATE with even with heterogeneity if treatment is equiprobable.

    3. This complication is typically addressed in one of two ways: “controlling for blocks” in a regression context, or inverse probability weights (IPW), in which units are weighted by the inverse of the probability that the unit is in the condition that it is in.

      I don't think these are equivalent. I believe only the latter recovers the ATE under heterogeneity... but this is just my memory.

    4. The gains from a blocked design can often be realized through covariate adjustment alone.

      I believe Athey and Heckman come out strongly in favor of blocking instead of covariate adjustment.

    5. Of course, such heterogeneity could be explored if complete random assignment had been used, but blocking on a covariate defends a researcher (somewhat) against claims of data dredging.

      A preregistration plan can accomplish this without any cost.

    6. In this simulation complete random assignment led to a -0.59% decrease in sampling variability. This decrease was obtained with a small design tweak that costs the researcher essentially nothing.

      This is not visible in the html. You specified too few digits.

      Also, the results would be more striking if you had a smaller data set.

    7. with(hec, mean(Y1 - Y0))

      ATE with heterogeneity?

    8. # Reveal observed potential outcomes

      He means 'the outcome observed given random assignment'

    9. when deploying a survey experiment on a platform like Qualtrics, simple random assignment is the only possibility due to the inflexibility of the built-in random assignment tools.

      That's not entirely true

    10. Since you need to know N beforehand in order to use simple_ra(), it may seem like a useless function.

      this is a confusing sentence

    11. depending on the random assignment, a different number of subjects might be assigned to each group.

      In large samples this won't usually matter much... but still worth avoiding, to make power as high as possible.

    12. Y0 <- rnorm(n = N,mean = (2*as.numeric(Hair) + -4*as.numeric(Eye) + -6*as.numeric(Sex)), sd = 5)

      linear heterogeneity of baseline and of TE

    13. hec <- within(hec,{

      why does he use 'within' rather than mutate?

    1. Solution! Re: Export To Excel Mark as New Bookmark Subscribe Mute Subscribe to RSS Feed Permalink Print Email to a Friend Report Inappropriate Content slipstream42 Regular ‎2017-07-31 04:22 PM another csv export link it is quite nicehttps://rawgit.com/watsonbox/exportify/master/exportify.htmlcode on github  View solution in original post 31 Likes

      works great

    1. As you can see, having one fewer child still comes out looking like a solid way to reduce carbon emissions — but it’s absolutely nowhere near as effective as it first seemed. It no longer dwarfs the other options. On this model, instead of having one fewer kid, you can skip a couple of transatlantic flights and you’ll save the same amount of carbon. That seems like a way more manageable sacrifice if you’re a young person who longs to be a parent.

      Even if I believed the highly optimistic predictions of very strong climate policy in the USA (which I don't), having one fewer child still reduces emissions each year more than twice as much as living car free or avoiding a trans-atlantic flight every year.

      And they state it as "instead of having one fewer kid, you can skip a couple of transatlantic flights and you’ll save the same amount of carbon." ... but this requires each parent to forgo 2 transatlantic flights they would have taken every year for the rest of their life, if I understand correctly.

    1. commas <- function(...) stringr::str_c(..., collapse = ", ")

      no braces needed for function on a single line

    1. it’s the same as the input!

      because we want to modify columns in place

    2. Compute the mean of every column in mtcars.
      output <- vector("double", ncol(mtcars))  # 1. output
      for (i in seq_along(mtcars)) {            # 2. sequence
        output[[i]] <- mean(mtcars[[i]])      # 3. body
      }
      
    1. The rational expectation and thelearning-from-price literatures argue that equilibrium prices are accurate becausethey reveal and aggregate the information of all market participants. The MarketSelection Hypothesis,MSH, proposes instead that prices become accurate becausethey eventually reflect only the beliefs of the most accurate agent. The Wisdomof the Crowd argument,WOC, however suggests that market prices are accuratebecause individual, idiosyncratic errors are averaged out by the price formationmechanism

      Three models (arguments for) drivers of market efficiency

    1. external fundraising page

      What is meant by 'external'?

    2. Fundraising Dashboard / Participant Center Visited When a person visits their fundraising dashboard or participant center

      who is the 'person' visiting the dashboard here?

    3. Fundraising Page Created / Registration Complete Upon completion of the last step of the registration flow that creates a fundraising page

      by whom? which ones can be detected?

    1. Contributing to the textbook Use Hypothes.is, an amazing tool for annotating the web. Go to Hypothes.is, and “get-started”

      To nudge people slightly towards this, you can add to the index.Rmd:

          includes:
            in_header: [header_include.html]
      

      And in that header_include.html file, inclide

      
      <script async defer src="https://hypothes.is/embed.js"></script>
      

      I do this here in my Writing Economics book

    1. head(as.data.frame(nasa))#> lat long month year cloudhigh cloudlow cloudmid ozone pressure #> 1 36.20000 -113.8 1 1995 26.0 7.5 34.5 304 835 #> 2 33.70435 -113.8 1 1995 20.0 11.5 32.5 304 940 #> 3 31.20870 -113.8 1 1995 16.0 16.5 26.0 298 960 #> 4 28.71304 -113.8 1 1995 13.0 20.5 14.5 276 990 #> 5 26.21739 -113.8 1 1995 7.5 26.0 10.5 274 1000 #> 6 23.72174 -113.8 1 1995 8.0 30.0 9.5 264 1000 #> surftemp temperature #> 1 272.7 272.1 #> 2 279.5 282.2 #> 3 284.7 285.2 #> 4 289.3 290.7 #> 5 292.2 292.7 #> 6 294.1 293.6

      unrolling a tbl_cube into 2 dimensions (data.frame)

    1. Now this can be simplified using the new {{}} syntax: summarise_groups <- function(dataframe, grouping_var, column_name){ dataframe %>% group_by({{grouping_var}}) %>% summarise({{column_name}} := mean({{column_name}}, na.rm = TRUE)) } Much easier and cleaner! You still have to use the := operator instead of = for the column name however. Also, from my understanding, if you want to modify the column names, for instance in this case return "mean_height" instead of height you have to keep using the enquo()–!! syntax.

      curly curly syntax

    1. (1) “How likely do you think it is that this hypothesis will be replicated (on a scale from 0% to 100%)?” (2) “How large do you think the standardized effect size (in terms of Cohen’s d) from the replication will be, relative to that in the original paper (on a scale from −50% to 200%)?”, and (3) “How well do you know this topic? (Not at all; Slightly; Moderately; Very well; Extremely well.)”

      pre-market survey Many labs 2

    2. 0.506 (0.532)

      can we find any measures of dispersion here?

    3. For the 12 studies with an original p < 0.005, 10 (83%) replicated. For the 12 studies with an original p > 0.005, only 1 (8%) replicated. Further work is needed to test if prediction markets outperform predictions based only on the initial p-value, to test if the market also aggregates other information important for reproducibility.

      p values may capture all the information?

    1. At least one of my un-named research co-authors will heartily agree with this statement.↩︎

      Hi Dave!

  6. Jan 2020
    1. ps f

      this doesn't run on my system. However ps -f seems to list processes started in the terminal and ps -ef lists all (?) processes

    2. List previous command history: history

      useful

    1. It’s worth noting that first line of the script starts with #!. It is a special directive which Unix treats differently.

      Term hash tag at top of bash scripts are NOT comments... they are important

    1. Happy Git and GitHub for the useR

      Oska: can you see this note?

    1. Prediction in Policy

      Relevance to my own project: can we predict who has the most to gain from admission to an HE institution. (But I'm limited in what I can report)

    2. Suppose the algorithm chooses a tree that splits on education but not on age. Conditional on this tree, the estimated coefficients are consistent. But that does not imply that treatment effects do not also vary by age, as education may well covary with age; on other draws of the data, in fact, the same procedure could have chosen a tree that split on age instead

      a caveat

    3. hese heterogenous treatment effects can be used to assign treatments; Misra and Dubé (2016) illustrate this on the problem of price targeting, applying Bayesian regularized methods to a large-scale experiment where prices were randomly assigned

      todo -- look into the implication for treatment assignment with heterogeneity

    4. Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) take care of high-dimensional controls in treatment effect estimation by solving two simultaneous prediction problems, one in the outcome and one in the treatment equation.

      this seems similar to my idea of regularizing on only a subset of the variables

    5. In particular, a set of papers has already introduced regu-larization into the first stage in a high-dimensional setting, including the LASSO (Belloni, Chen, Chernozhukov, and Hansen 2012) and ridge regression (Carrasco 2012; Hansen and Kozbur 2014

      worth referencing

    6. These same techniques applied here result in split-sample instrumental variables (Angrist and Krueger 1995) and “jackknife” instrumental variables

      some classical solutions to IV bias are akin to ML solutions

    7. Understood this way, the finite-sample biases in instrumental variables are a consequence of overfitting.

      traditional 'finite sample bias of IV' is really overfitting

    8. Even when we are interested in a parameter β ˆ, the tool we use to recover that parameter may contain (often implicitly) a prediction component. Take the case of linear instrumental variables understood as a two-stage procedure: first regress x = γ′z + δ on the instrument z, then regress y = β′x + ε on the fitted values x ˆ. The first stage is typically handled as an estimation step. But this is effectively a prediction task: only the predictions x ˆ enter the second stage; the coefficients in the first stage are merely a means to these fitted values.

      first stage of IV -- handled as an estimation problem, but really it's a prediction problem!

    9. Prediction in the Service of Estimation

      This is especially relevant to economists across the board, even the ML skeptics

    10. New Data

      The first application: constructing variables and meaning from high-dimensional data, especially outcome variables

      • satellite images (of energy use, lights etc) --> economic activity
      • cell phone data, Google street view to measure wealth
      • extract similarity of firms from 10k reports
      • even traditional data .. matching individuals in historical censuses
    11. Zhao and Yu (2006) who establish asymptotic model-selection consistency for the LASSO. Besides assuming that the true model is “sparse”—only a few variables are relevant—they also require the “irrepresentable condition” between observables: loosely put, none of the irrelevant covariates can be even moderately related to the set of relevant ones.

      Basically unrealistic for microeconomic applications imho

    12. First, it encourages the choice of less complex, but wrong models. Even if the best model uses interactions of number of bathrooms with number of rooms, regularization may lead to a choice of a simpler (but worse) model that uses only number of fireplaces. Second, it can bring with it a cousin of omitted variable bias, where we are typically concerned with correlations between observed variables and unobserved ones. Here, when regular-ization excludes some variables, even a correlation between observed variables and other observed (but excluded) ones can create bias in the estimated coefficients.

      Is this equally a problem for procedures that do not assum sparsity, such as the Ridge model?

    13. 97the variables are correlated with each other (say the number of rooms of a house and its square-footage), then such variables are substitutes in predicting house prices. Similar predictions can be produced using very different variables. Which variables are actually chosen depends on the specific finite sample.

      Lasso-chosen variables are unstable because of what we usually call 'multicollinearity.'<br> This presents a problem for making inferences from estimated coefficients.

    14. Through its regularizer, LASSO produces a sparse prediction function, so that many coefficients are zero and are “not used”—in this example, we find that more than half the variables are unused in each run

      This is true but they fail to mention that LASSO also shrinks the coefficients on variables that it keeps towards zero (relative to OLS). I think this is commonly misunderstood (from people I've spoken with).

    15. One obvious problem that arises in making such inferences is the lack of stan-dard errors on the coefficients. Even when machine-learning predictors produce familiar output like linear functions, forming these standard errors can be more complicated than seems at first glance as they would have to account for the model selection itself. In fact, Leeb and Pötscher (2006, 2008) develop conditions under which it is impossible to obtain (uniformly) consistent estimates of the distribution of model parameters after data-driven selection.

      This is a very serious limitation for Economics academic work.

    16. First, econometrics can guide design choices, such as the number of folds or the function class.

      How would Econometrics guide us in this?

    17. These choices about how to represent the features will interact with the regularizer and function class: A linear model can reproduce the log base area per room from log base area and log room number easily, while a regression tree would require many splits to do so.

      The choice of 'how to represent the features' is consequential ... it's not just 'throw it all in' (kitchen sink approach)

    18. Ta b l e 2Some Machine Learning Algorithms

      This is a very helpful table!

    19. Picking the prediction func-tion then involves two steps: The first step is, conditional on a level of complexity, to pick the best in-sample loss-minimizing function.8 The second step is to estimate the optimal level of complexity using empirical tuning (as we saw in cross-validating the depth of the tree).

      ML explained while standing on one leg.

    20. egularization combines with the observability of predic-tion quality to allow us to fit flexible functional forms and still find generalizable structure.

      But we can't really make statistical inferences about the structure, can we?

    21. This procedure works because prediction quality is observable: both predic-tions y ˆ and outcomes y are observed. Contrast this with parameter estimation, where typically we must rely on assumptions about the data-generating process to ensure consistency.

      I'm not clear what the implication they are making here is. Does it in some sense 'not work' with respect to parameter estimation?

    22. In empirical tuning, we create an out-of-sample experiment inside the original sample.

      remember that tuning is done within the training sample

    23. Performance of Different Algorithms in Predicting House Values

      Any reason they didn't try a Ridge or an Elastic net model here? My instinct is that these will beat LASSO for most Economic applications.

    24. We consider 10,000 randomly selected owner-occupied units from the 2011 metropolitan sample of the American Housing Survey. In addition to the values of each unit, we also include 150 variables that contain information about the unit and its location, such as the number of rooms, the base area, and the census region within the United States. To compare different prediction tech-niques, we evaluate how well each approach predicts (log) unit value on a separate hold-out set of 41,808 units from the same sample. All details on the sample and our empirical exercise can be found in an online appendix available with this paper athttp://e-jep.org

      Seems a useful example for trying/testing/benchmarking. But the link didn't work for me. Can anyone find it? Is it interactive? (This is why I think papers should be html and not pdfs...)

    25. Making sense of complex data such as images and text often involves a prediction pre-processing step.

      In using 'new kinds of data' in Economics we often need to do a 'classification step' first

    26. The fundamental insight behind these breakthroughs is as much statis-tical as computational. Machine intelligence became possible once researchers stopped approaching intelligence tasks procedurally and began tackling them empirically.

      I hadn't thought about how this unites the 'statistics to learn stuff' part of ML and the 'build a tool to do a task' part. Well-phrased.

    27. Why not also use it to learn something about the “underlying model”: specifically, why not use it to make infer-ences about the underlying data-generating process?

      (they give reasons why not)

    28. Economic theory and content expertise play a crucial role in guiding where the algorithm looks for structure first. This is the sense in which “simply throw it all in” is an unreasonable way to understand or run these machine learning algo-rithms.

      At least we (Economists) hope this is the case ... motivated reasoning?

    29. available finite-sample guidance on its implementation—such as heuristics for the number of folds (usually five to ten) or the “one standard-error rule” for tuning the LASSO (Hastie, Tibshirani, and Friedman 2009)—has a more ad-hoc flavor.

      It sounds like there are big unknowns... a lot is still 'rules of thumb'

    30. Should out-of-sample performance be estimated using some known correction for overfitting (such as an adjusted R2 when it is available) or using cross-validation?

      Do people use \(R^2_{adj}\) for this? Would that fit under 'machine learning'?

    31. Kernel regressionKernel bandwidth

      hadn't realised kernel can be done in the ML framework

    32. final category is in direct policy applications. Deciding which teacher to hire implicitly involves a prediction task (what added value will a given teacher have?), one that is intimately tied to the causal question of the value of an additional teacher.

      Academic Economists: this is usually the 'we also can do this' part of a paper rather than it's core, no?

    33. In another category of applications, the key object of interest is actually a parameter β, but the inference procedures (often implicitly) contain a prediction task. For example, the first stage of a linear instrumental variables regres-sion is effectively prediction. The same is true when estimating heterogeneous treatment effects, testing for effects on multiple outcomes in experiments, and flexibly controlling for observed confounders.

      This is most relevant tool for me. Before I learned about ML I often thought about using 'stepwise selection' for such tasks... to find the best set of 'control variables' etc. But without regularisation this seemed problematic.

    34. Machine Learning: An Applied Econometric Approach

      Shall we use Hypothesis to have a discussion ?

    1. David Chalmers: I think I may have introduced this actually years ago. I call it “The law of minimization of mystery”. I was making fun of some people who wanted to tie consciousness and quantum mechanics. But I do think there are interesting potential links between the two. I mean. The problem in quantum mechanics is not just that it’s mysterious.It’s a very specific problem. How do the standard dynamics of quantum mechanics which has these two processes: Schrodinger evolution and wave function collapse. Mostly the wave function evolves this way, except on certain times, when you make a measurement, the wave function changes in this special way, and that’s totally standard quantum mechanics.

      Some great points here.

      He is saying something like "consciousness collapses the quantum wave function"

      Bundling quantum mechanics and consciousness together… “The law of minimisation of mysteries” (a sarcastic point).

      “maybe these two weird things are just one weird thing”

      But it seems natural because there is this mysterious. Process, collapse of a wave function, that happens on measurement and what is measurement but conscious observation?

    2. I’ve argued you need something new in the story and the kind of view I’ve been drawn towards are views that take consciousness as something sort of fundamental and irreducible in our picture of the natural world in the same way that we take space and time and mass and charge as fundamental. We’re used to the idea that some things are fundamental. If you can’t explain electromagnetic phenomena in terms of your old laws, your old properties and laws, spacetime, mass, Newtonian dynamics, you bring in something else. Electric charge, Maxwell’s laws.Likewise, I think for consciousness. So I’ve been drawn towards views that take consciousness as fundamental and what that comes to in practice in philosophy is either you’ve got the choice between either a dualist view where you’ve got. You’ve got the physical world, which is one thing, and then you’ve got say the mind, you’ve got consciousness, which is another thing.They’re both fundamental properties distinct from each other. And then there are laws that connect them. That’s one view. And the other view is panpsychism, which says consciousness is somehow present at the very basis of the physical world and maybe the physics that we know and love basically somehow fundamentally involves consciousness somehow.

      Should we consider consciousness as a fundamental of the universe like space-time mass and magnetism? I.e., Irreducible.

      This leads to dualism and panpsychism. The latter apparently asserts that there is some element of consciousness that is part of the equation and physical processes and maybe its present everywhere.

    3. think the most interesting of which is that this whole idea of consciousness is an illusion. A pathology built up by our cognitive systems to believe we have these special properties of consciousness introspectively, even though we don’t.

      "Illusionism" (1.21:15) There is an argument that our brain and the physical processes trick us into thinking we have the special properties of consciousness.

      I don’t understand this argument, if we don’t have consciousness then who is in there that can be tricked?

    4. David Chalmers: Yeah, but I also think there are these sociological effects where most people think… we got this on the PhilPapers survey that most people think that most people think a certain thing, even though most people think the opposite.

      Inconsistency between first and second orderly beliefs among philosophers... in the survey there are many cases where people think everyone thinks A when actually most people think B.

    5. I’m sure that many of them are among listeners potentially. It seems like there’s a bit of a stream of this among rationalists and I often find natural scientists, I just can’t get them to accept that there’s like anything strange about consciousness existing.

      people who deny that there is anything puzzling or special about consciousness. ... These people are just being JERKS.

    6. David Chalmers: Well, the big obvious thing that can be said in defense of philosophy here is the thing that I said already. Which is philosophy by its nature is the field where there’s disagreement, because once we obtain methods for producing an agreement on questions and reasonably decisive ways, we spin it off and it’s no longer philosophy.So from that perspective, philosophy has been this incredibly effective incubator of disciplines. Physics span out of philosophy. Psychology span out of philosophy. Arguably to some extent, economics and linguistics span out of philosophy. So what usually happens is not that we entirely solve a whole of a philosophical problem, but we come up with some methods of say, making progress experimentally or formally on a certain subquestion or aspects of that question and then that gets spun off. The part that we haven’t figured out how to think about well enough that remains philosophy.

      I hadn't appreciated this point before

    7. the highest correlation coefficient in this survey was 0.56 which is kind of only moderate and it was between moral realism and cognitivism which obviously have a lot to do with one another directly

      what were the other typical correlations?

    8. Robert Wiblin: Yeah. I guess in politics it seems that that brings out people’s tribal instincts, so they tend to group together for practical reasons, if not intellectual reasons, like kind of all sharing the same views or like wanting to fall into line and are particularly incentivized to do that. An interesting thing, I’ll provide a link to a study looking at how ideologically tightly grouped are people in politics, which found that uneducated people just like have views all over the place. Their views on one question don’t really predict their views on another.

      The Ezra Klein take on Donald Kinder and Nathan Kalmoe’s "Neither Liberal nor Conservative: Ideological Innocence in the American Public. "?

      So are you saying that you think political people group together more in politics than philosophers do in philosophy? Hard to make an apples-to-apples comparison here, of course, as most people don't think deeply about these philosophical questions.

    9. David Chalmers: I don’t know. Where I come from, 0.56 is a pretty high correlation coefficient. So I don’t know. Maybe it depends on the area.

      seems high to me also

    10. And I think at that point, wherein something like in a verbal dispute, which can happen in these cases where you have different people mean different things by free will and well, I think that diagnosis is more apt for some of these questions than for others, but like this is the case again where your average philosopher uses free will, well the thing tied to moral responsibility. Many people outside philosophy may still think, ah, why use free will for that? I want to use free will for this other thing. This ability to fundamentally go against the laws of nature. And then we just have a difference about which one is worth caring about.

      Free will cop-out

      They get into the discussion of free will and they answer the question they think is easy -- moral responsibility is compatible with a lack of free-will (they say, they never presented the complete argument).

      This feels like somewhat of a cop-out, akin to a political spokesperson answering the question they wanted to answer and not the one you asked. (And then claiming they answered your question).

      Free will ... People aren't primarily interested in it for the 'moral responsibility' question.

      I want to know if I have free will to know if I'm in control, to know if anything I do matters or if I'm just an automaton/robot.

    11. Now someone interested in free will could say, well, that wasn’t what I cared about (moral responsibility).

      I'm that guy. I'm saying exactly this.

    12. What is it like to be you right now? You’re seeing this text on the screen, you smell the coffee next to you, feel the warmth of the cup, and hear your housemates arguing about whether Home Alone was better than Home Alone 2: Lost in New York. There’s a lot going on in your head — your conscious experiences

      First of all, this is a great podcast and a great episode. Lots of great exchanges. This actually 'adds value' (cringey term, sorry) in terms of your understanding of those topics that come up in deeper-than-deep late-night wistful hours conversations. If you are interested in what it means to live you should listen to this episode. Don't waste your life not listening to this episode.

    1. Update: Free online access to this text may be available through your university/library (this works at Exeter) via www.vlebooks.com

      I think you need to sign in via Shibboleth

    2. 1.2 Readings and resources

      Note that for 2019 The NS is the main one referred to!

  7. Dec 2019
    1. Bee2024 finishing insurance problem from sixth problem set

      May be helpful in revising

    1. Power rule

      I will give you the formula for the power rule on an exam, but it wouldn't hurt to practice it!

    1. Asymmetric information (Moral hazard, adverse selection, signaling) Separate notes/handouts, to be integrated in Readings tbd Behavioural economics – first lecture (limits to cognition, willpower, self-interest), second lecture (applications) [11] NS: Ch 17, plus supplements T1:11 Further readings tbd Eighth problem set: Behavioural economics Overall each reading is optional, but to do well on the exam you need to do at least some of these readings, and/or cover some of the more advanced material. Supplementary reading: theory Amos Tversky & Daniel Kahneman, 1979. “Prospect Theory: An Analysis of Decision under Risk” (Seminal) Supplementary reading: applications and empirical work DellaVigna, Stefano. “Psychology and economics: Evidence from the field.” Journal of Economic literature 47.2 (2009): 315-372. Benartzi, S. & Thaler, R.H., 2007. Heuristics and biases in retirement savings behavior. The journal of economic perspectives, pp.81-104. Farber, H., 2008. Reference-dependent preferences and labor supply: The case of New York City taxi drivers. The American Economic Review. Available at: http://www.ingentaconnect.com/content/aea/aer/2008/00000098/00000003/art00021 [Accessed November 19, 2015]. (Behavioural insights team) EAST: Four simple ways to apply behavioural insight Kellner, Reinstein and Riener, 2016. Conditional generosity and uncertain income: Field and lab evidence Material linked at giveifyouwin.org

      2019-20: This has not been covered

    1. The payoff from staying Silent (cooperating) in each period is: −2×(1+g+g2+g3+...)−2×(1+g+g2+g3+...)-2 \times (1 + g + g^2 + g^3 + ... ) Here I get -2 in each period, starting today. Discounting this, we add up -2 (today), −2g−2g-2g (next period), −2g2−2g2-2g^2 (the period after next), etc, as represented above. The payoff from Confessing right away (after which both players Confess always) is: −1−3×(g+g2+g3+...)−1−3×(g+g2+g3+...) -1 -3 \times (g + g^2 + g^3 + ... ) Formula for a geometric series (where 0<g<10<g<10<g<1): g+g2+g3+g4...=g/(1−g)g+g2+g3+g4...=g/(1−g)g + g^2 + g^3 + g^4 ... = g/(1-g) Note on Maths: The standard derivation of this, which is pretty neat, is in the text. This formula is an important one in economics (and beyond), particularly for discounting a constant stream of payoffs, e.g., stock dividends Thus cooperation in a single period is ‘weakly preferred’ (at least as good) if (−2)×(1+g+g2+g3+...)≥(−1)+−3×(g+g2+g3+...)(−2)×(1+g+g2+g3+...)≥(−1)+−3×(g+g2+g3+...)(-2) \times (1 + g + g^2 + g^3 + ... ) \geq (-1) + -3 \times (g + g^2 + g^3 + ...) g+g2+g3+...≥1g+g2+g3+...≥1g + g^2 + g^3 + ... \geq 1 Note on the intuition for the second formula: the left side is loss of future payoffs (-3 vs -2 forever from next period, so a loss of 1 per period starting tomorrow). The right side is gain in ‘the present’ period (getting -1 rather than -2), so it is un-discounted. g/(1−g)≥1g/(1−g)≥1g/(1-g) \geq 1 g≥12

      2019-20: you will not be asked to do this computation on the final exam, but you should understand the general idea

    1. Nash proved that if we allow mixed strategies, then every game with a finite number of players in which each player can choose from finitely many pure strategies has at least one Nash equilibrium.

      It always has at least one Nash equilibrium (but it may only be a NE in mixed strategies).

    1. More volatile underlying assets will translate to higher options premiums, because with volatility there is a greater probability that the options will end up in-the-money at expiration.

      That's interesting

    1. The option is European and can only be exercised at expiration.No dividends are paid out during the life of the option.Markets are efficient (i.e., market movements cannot be predicted).There are no transaction costs in buying the option.The risk-free rate and volatility of the underlying are known and constant.The returns on the underlying are normally distributed.

      Some of the assumptions underlying the Black-Scholes model. Do these limit its realism and predictive power?

    1. In low-income countries the vast majority are unwilling to pay for effective drugs simply because they are unable to pay. Low-income nations need more price discrimination—and vastly lower prices—if they are ever to afford the world's most effective medicines.

      Does price discrimination help poor countries here? Which countries have more price-inelastic demand? Does PD increase social welfare for this case?

    1. She found a German seller offering packs of the same nappies she buys in Luxembourg for the same price she normally pays. Looking more closely at the unit price, however, Nadine realised that the German packs contained 140 nappies, whereas the packs in Luxembourg had only 90, making them much more expensive. She switched straight away to buying all her nappies from the German shop.

      If this was price discrimination... which country's consumers likely had the higher price elasticity?

    1. I think that the preservation of these documents could be seen as providing pure public good. We value that these have been preserved for posterity even if we don't visit the Magna Carta ourselves. What do you think?

  8. Nov 2019
    1. map_dbl(df, mean)

      it just performs this operation on all 'list elements'... here, vectors in the dataframe

    2. out[i] <- fun(df[[i]])

      this step is critical: we can apply the function referenced in the function argument

    1. Holt, C., and S. Laury (2002), Risk Aversion and Incentive Effects, American Economic Review, v. 92 (5): 1644-1655. Crosetto, Paolo, and Antonio Filippin. “A theoretical and experimental appraisal of four risk elicitation methods.” Experimental Economics 19, no. 3 (2016): 613-641. Pedroni, Andreas, Renato Frey, Adrian Bruhin, Gilles Dutilh, Ralph Hertwig, and Jörg Rieskamp. “The risk elicitation puzzle.” Nature Human Behaviour 1, no. 11 (2017): 803.

      These are worth looking at closely and discussing

    1. Find the pure-strategy Nash equilibrium or equilibria

      Tutorial - try to focus on how we know ther ewill be 2 pure strategies and 1 mixed strategy NE here

    2. 3. Now draw the farmers’ ‘best response functions’ in a diagram.

      Worth covering in tutorial starting from here, focusing on intuition rather than algebra and calculus

    3. Do parts A and B; part C is optional enrichment

      Worth covering A-B in tutorial if time permits

    1. This (answer) should say: MR=53-2Q=MC=5... the "-" or dash is confusing here

    2. ero marginal cost for one more user.

      We know this from the statement

      Suppose also that the intended pool is large enough so that whatever number of families come on any day will not affect what people are willing to pay for the pool. (I.e., no congestion)

    1. This second assumption, called diminishing marginal utility, will imply ‘risk aversion’!

      A student asked

      I want to ask why risk-averse has a decreasing marginal utility? Thank you.

      Response:

      If someone has a decreasing marginal utility of income and they maximise expected utility then they will be risk averse.

      This is something that takes a long time to fully explain, and I try to give an explanation in the web-book and in lecture (and again in tomorrow's lecture).

      One simple intuition.: Risk averse essentially means "I will never take any fair gamble".

      E.g., "I'll never accept a bet with an equal chance of losing or gaining some amount X." How does diminishing MU of income explain this? If I have diminishing MU of income then my utility is increasing in income at a decreasing rate.

      The first units of income (e.g., going from 0 income to 15k income) add more utility than the later units of income (e.g., going from 15k income to 30k income) , which adds more than even later increments (e.g., going from 30k to 45k), etc.

      So "an equal chance of losing or gaining X" would not be attractive to such a person. Why not? Because relative to any point "losing X" reduces my utility more than "gaining X" increases it.

      E.g., in the above example, if you started at 15K income you wouldn't want to have an equal chance of losing or gaining 15K in income. Having 0 income would be terrible, while having 30k income would be better, but not 'that much' better. As we said, the utility difference between 0 and 15K is much greater than the utility difference between 15k and 30k... because of the assumption of diminishing marginal utility. So it's better to have 15k for sure than to have a 50/50 chance of 0k or 30k.

      The 'utility loss from losing 15k' is greater than the 'utility gain from gaining 15k'. As expected utility weights the utility of each outcome by its probability and sums these, in considering a 1/2 chance of losing 15k and a 1/2 chance of gaining 15k these probabilities weight equally, so I only need to consider "does the utility cost of losing 15k exceed the utility gain from gaining 15k" in this example. Because of diminishing MU, we know it does not. Nor does it for any "equal chance of losing or gaining some amount X". Thus this person is risk-averse.

      I hope this helps. Looking at the 'utility of income' diagrams may also be helpful.

    1. Two statistics about reducing your risk of an early death made headlines around the world recently. The first seems to be a great reason to add a four-legged friend to your life. It suggests that owning a dog is tied to lowering your chance of dying early by nearly a quarter. The second statistic claims that even a minimal amount of running is linked to reducing your risk of premature death by up to 30%. Ruth Alexander finds out what’s behind these numbers and we hear from epidemiologist, Gideon Meyerowitz-Katz.

      It's amazing that statistics like these... (seemingly without even minimal obvious controls for age etc.) get reported so naively in the media. Note that one of the interviewees suggests one approach that would provide evidence on the impact of pets on longevity ... random dog assignment. He seems to doubt the health benefits; I don't know, it seems plausible to me, but I'd like to see some real evidence.

    1. 2018 final exam with suggested answer guidelines

      I just put this up ... last year's final exam with suggested answer guidelines

    1. For problem 6 I'll award 1.5 marks for "CD" even though it's not correct. But I admit you need to look at the wording of this question carefully

    1. writing

      If you are going for 'fancy stuff' you might mention markdown, rmarkdown/knitr etc. This will replace latex imho

    2. LaTeX is a high-quality system equipped with special features for technical and scientific documentation. A great tool for thesis help due to its user-friendly interface and dozens of helpful features. For example, the tool automatically generates bibliographies and indexes.

      If you are going to mention latex you should mention Overleaf

    1. NS: Ch 16 – public goods section only (skip Lindahl equiliibrium; sections on median voter and single-peaked preferences are optional)

      Note that I give the reading from the Nicholson and Snyder text at the top of each section.

  9. Oct 2019