175 Matching Annotations
  1. Apr 2024
    1. basics

      (#12) Go over the questions from the main google doc

    1. deviance

      (#34)

      *N2 (34) (Rebecca): I’m not sure if I understand deviance (from the lecture notes ) – is it something comparable to R2 in an OLS model? To what extent does it really “matter” in interpreting results from maximum likelihood estimation?

      Response: First, note that every model can be fit by maximum likelihood maximization. After OLS, R2 doesn’t work anymore. The deviance is related to the log likelihood by the equation in 3.7 of the notes.

    2. the likelihood function

      (#17)

      *N7 (17) (Rita): From the lecture note, when calculating the likelihood function(9:13), what is the difference between the P and phat for logit equation? Could you further explain the likelihood table too?

      Response: P would refer to the actual P for that case, and Phat is our prediction of it. Of course, we don’t observe P only Y=1 or 0.

    3. Q2. p.571. (Group 2)

      The reverse problem

      • N5 (#14, 26) (Syl): Can you explain to me like I’m 5 what you mean by “we’re working backwards” on slide 4? I’ve never thought about logit models like that and for some reason it’s not clicking and maybe even confusing me a little.

      Response: This gets at Q2:

    4. maximum likelihood estimation

      (#14)

      *N1 (14) (Lily): There are many different phrases discussing “likelihood.” Can we go over the difference between them (i.e., log likelihood versus maximum likelihood)? (Osamudia)

      Response: Sure. Likelihood (L) = the probability that a specific event (or combination of events will occur), given the set of coefficients. Log likelihood is just the log of L. Maximum likelihood is the process of solving a likelihood equation.

      *N3 (#14)(Savannah): Does the MLE get reported in a typical logit regression table in R, or do we have to do an extra command to see it? Additionally, can we compare MLEs across different model specifications to see which model has the best fit (like we do with R^2 in OLS)?

      Response: The MLE would mean the maximum likelihood estimates--those are the coefficients. The log likelihood gets reported--or the deviance--which is just another way of reporting it.

      *N4 (#14) (Osamudia)--could we contextualize this against probit and logit models? I’m having trouble understanding the scenarios for which this sort of analyses would be useful.

      Response: Maximum likelihood estimation is the process by which all models get estimated (aside from OLS). You can’t avoid it. The results (i.e. the coefficients) are what we interpret.

      • N5 (#14) (Syl): Can you explain to me like I’m 5 what you mean by “we’re working backwards” on slide 4? I’ve never thought about logit models like that and for some reason it’s not clicking and maybe even confusing me a little.

      Response: This gets at Q2:

      *N6 (#14) (Mia): I’m trying to wrap my head around calculating the best probability to give us similar data observations. Are there no implications to research when our predicted probability is a better fit for some demographic groups then others? How important is sample size in accurately getting to maximized likelihoodness?

      Response: Note that this is what the program does given the variables (and the paramaterization--i.e., do you want interaction terms? Do you want to stratify by race? Do you want to include age squared etc.--> those are the choice you make. Then, given those choices, the algorithm finds the best fitting coefficients.

    5. Recap

      (#12)

      Start here

    1. marginal effects

      (#39) --note on line 12 what the correct answer is !!!

    2. marginal effects

      (#38) Marginal effects --cover this after slides 36/37

      Jump back to section 6.1.2 of Class L (Probit) to make a key point about coefficients and marginal effects ---> actually, the same point is coming up next

    3. baseline model

      (36) Baseline model

      --go over this, then predict the probability on the next slide

    4. calculating the probability

      (#14) After this, jump down the baseline model on slide 36

    5. basic setup

      (#13) Basic setup

    6. where

      (#12) start here

    1. Q8. (G1)

      (24) The probit link

      *L5 (24) .(Mia): We’re getting z score values and I remember the Z table showed us the likelihood of getting what you got back in the first level of statistics. How is this different from what we were doing with Z before?

      Response: You need to use the Z table to go from the predicted latent scale to the predicted probability…we will do an example in class.

    2. marginal effects

      (31) Marginal effects

      *L6.(31) (Savannah) I am confused by the concept of marginal effects. I’m not sure I really understand what they are and why we need to do them on top of running the probit model.

      Response: The marginal effect of X is the impact of X on the predicted probability; it’s different than the impact on the latent scale. The marginal effects are not constant.

      *L3.(31) (Mia): When we talk about the marginal effects and we run the probitmfx and saw that an additional unit increase in x resulted in a 0.122 increase is this in like a Z unit or the actually probabilistic increase?

      Response: The probability.

      *L8.(31)(Rebecca) In lecture, you reference that the marginal effects are calculating the change in odds based on a one-unit increase in the independent variable – how does this translate if you have a dichotomous independent variable? Would you just end up using a completely different model in that case?

      Response:It’s the same--i.e., the impact of going from X=0 to X=1.

    3. Figure 1

      (#15) *L9. (15) (Osamudia): I’m struggling with understanding the overall organization of the topics covered in the reading–are linear probability models and generalized linear models both ways of dealing with dichotomous outcomes? And if there are so many problems with the former, why use it at all?

      Response: The LPM (linear prob. model) essentially gives us the marginal effect--the effect of X on the probability. The problem is that this depends on whether you are in the tail or in the middle (i.e., the marginal effects are not constant). (I’ll discuss this more)

      *L1.(15) (Lauren): I am still confused on the difference between logit and probit. I see that probit is derived from logit, but using a different “link.” What do they mean by “link?” How do you know when to use one versus the other? (Osamudia)

      Response: The link translates the latent variable into a probability. We predict a linear effect on Z (the latent variable), and translate this into a nonlinear effect on P (the probability).

      *L2.(15)(Rita): I think I will need further explanation to explicitly differentiate between probit and logit links, and to know which one to apply for a given data.

      Response: They are different links, but they accomplish the same task (see L1). I’ll discuss more in class.

      *L10. (15) (Amy) Is there any relevant information lost in the process of converting an outcome value that is asymptotic to infinities into one that is bounded by zero and one? Further, why zero and one, instead of -1 and 1? Response: In the case of a logit/probit, we are dealing with 0/1 dichotomous variables--so the outcome is the probability that Y=1, which is bounded by 0 and 1.

    4. Figure 2

      (#16) *L4.(16)(Mia): Also I want to talk about why the variable Z is latent, in the example Z is not the explanatory variable (like how we use it in class)? So its saying we actually can’t observe likelihood or probability so its latent and measured through a dichotomous variable?

      Response: Yes---we are predicting Z, not P. (Discuss more)

      *L7.(16)(Syl): at this point i am bigger fan of logit models than probit models, but only because i do not see the point in the probit transformation when the logit one seems to work just fine? I guess would you mind explaining why we would prefer one over the other?

      Response: In terms of results, they are the same. The reason to also understand the probit model is that it makes it easier to understand the impact of heteroskedasticity in the error term on the coefficients, which is the topic of Class S--it’s a big deal. In the case of categorical DV’s, heteroskedasticity affects the coefficients in logit/probit etc. models. That is why people often report the marginal effects, or elect to use LPM models.

  2. Mar 2024
    1. Q7

      (20)

    2. Q6

      (19) M0 vs M1

    3. Q5

      (18) descriptive table

    4. Q1

      (14) what's the connection?

    5. Q2

      (15) Log likelihood

      (Lily): I know an ongoing question in this class has been how do we know which model is best to use. The article for today talks about using log-likelihood instead of R2. However, I am confused how they are different in theory. The article says that LL is used to see if a new parameter added improves the model by a significant amount but aren’t we monitoring R2 in a similar way already?

      Response: R2 only works with OLS models. Now that we are using more complicated models, LL is the basic measure of fit. We are going to discuss this in detail when we talk about logit models--but we are introducing it today with these models. Let’s discuss this in class.

      (*15)(Savannah) I am confused by what we mean by Maximum Likelihood Estimation (MLE). Particularly, what did you mean by the following statement: “Which set of coefficients make it most likely to observe the data that we actually saw?” I am confused by this because I don’t understand how we can pick and choose coefficients?

      Response: Yes…let’s talk about that in class.

      (*15)(Syl) Are growth curve models only for data with outcomes from scales/indexes with whole number values? Why (mathematically) are they preferable to treating the outcomes as continuous variables? I think knowing this answer will help me better understand the rationale for loglik rather than R2 as a goodness of fit test

      Response: growth curve models are for continuous variables also. The use of LL is due to the fact that only OLS models use R2.

    6. Q3

      (16) LRT

    1. Q3

      (#16)

    2. Q2

      (#15)

    3. Q1

      (#14)

    4. Q8

      (#21) (#21) 9.(Rita)- I was thinking in any model you should have an equation that includes an independent and dependent variable. Why did the author have the baseline model (Model 1) with only the intercept for the dependent variable. What is the intuition about that and could this be applied to any similar data or idea?

      Response: in multilevel models this is important to show that there is variation in the random effects--this motivates the rest of the analysis.

    5. Q7

      (#20) Q9 (#20) (*20)4.(Osamudia): Can we please walk through the interpretation of the model on p. 28 of the reading? (ETA: I see that this actually the substance of Q7). Response: Yes--we will definitely talk about Q7 in class

      (*20) 5.(Mia): When conceptualizing cross-level interactions, I am wondering if its the case that our moderators will always be the level two variables? When first reading the article it seemed like age and education took on the moderating effect of national affluence and globalization involvement- but its actually the other way around. The level two is the moderator. Is it possible to have a cross-level interaction where the level 1 is moderating level 2?

      Response:moderators work both ways, as they are just interaction terms.

    6. Q9

      (#22)

      (*22) 6.(Mia): When we observe the interaction term between age and education with country level factors neither age nor country-level factors alone indicate a significant relationship with global self identity. If there was evidence of significance of the effect of age or education holding other factors constant does that change how we understand the relationship between the explanatory variable and the dependent variable?

      Response: great question. Yes, it does change how we interpret it. It means there is no “baseline” effect--i.e., the average effect is 0. But, some countries have + or - effects of these variables.

    1. What

      (#24) (*24 and 39)K5(Savannah):Can we go over the difference between the estimate of the random intercept (sd(_cons)) and the standard error of the intercept? Why do we compare these to determine statistical significance? Additionally, the variable is labeled sd(_cons), so I’m wondering where standard deviation fits into this?

      Response: Yes, let's discuss this by going over slides 24 and 39.

      Jacob What is log likelihood?

      Response: it is the general equivalent of R-squared for the large majority of models. The algorithm solving the model maximizes the log likelihood. We will cover this in class later when we cover logit models.

    2. ed

      (#17)

      Two stage versus 1-stage (reduced form):

      (*17) K2 (Osamudia): Although I understand why multi-level models are useful conceptually, I don’t actually understand how the formulas speak to each other, or what is happening when each level is plugged into the preceding formula (as executed on slide 17).

      Response: Let’s use the example of schools. The key shift is that things at the school level can be dependent variables at level 2. I.e., why do schools have different intercepts (i.e., average math scores net of explanatory variables)? Why does the gender gap vary across schools? Equations 1-3 describe this kind of set up.<br /> Equations 6 just combines equations 1-3 together in a single model.

      (*17) K9(Rita) I would like you to touch base with the equations on page 2 in the lecture note. I would like to understand the difference between stages 1 and 2.

      Response: Yes, let’s discuss this when we go over slide 17.

      (Natalie) Can you go over reduced form and two-stage form?

      Response: Yes--that is the key part of the first part of the lecture notes for today (class k). They are the same thing…the two stage form is much, much easier to understand, and the reduced form just is the result of substitution (as we will go over from the lecture notes).

      Jacob The authors discuss two methods – reduced form and two-stage form. They say two-stage form is common in some areas in social science, but reduced form is generally more popular. Why use one over the other?

      Response: See my response to #6 below---they are the same thing. The reduced form version is what we use to tell R or Stata what we are doing, that is why it is important to understand how to go from the two-stage version to the reduced form version.

      (Anna) I feel silly about this but I am pretty lost based on the reading. Are the equations they are doing brand new or are they the same or rather comparable to the ones we have gone over?

      Response: it’s the same as the Luke reading, just different letters--same concept. Let’s check back in on this question as we go over the equations in lecture today. The key thing is to think about the random intercept (eta1) and the random coefficient (eta2) being the dependent variables for the second stage equations. My goal in explaining this is to make this concept intuitive.

      In thinking about this--i.e., why random intercepts and random coefficients, let's refer to the idea of estimating separate models for each school (see Q1 slide 21 below). I.e., the reality is that there is variation in the intercept and the slope across schools. Our models should have the flexibility to express that. (A multilevel manifesto).

      (Alissa) Similar to Anna’s question above, I am getting lost trying to follow what all of the seemingly random greek symbols mean, so I’m having a hard time deciphering what the equations are calculating. For instance, in the lecture notes, B denotes a coefficient in equation 1, but then y (zeta, not y) denotes a coefficient in equations 2 and 3. Is there an easier way to keep these straight, or any resources you have that outlines what all these letters mean?

      Response: Yes--In these equations, greek letters mean coefficients or error terms. Roman “regular” letters mean observed variables. I would start with the idea that eta1 (the random intercept) and eta2 (the random coefficient) are things that naturally vary across level 2 units (i.e. schools and countries) and that this variation is a worthy subject of investigation (at level 2).

      Level 2 things are a higher level of aggregation. In the schools analysis, students are level 1 and schools are level 2. Variables at the school level are level 2 variables.

    3. download the

      (#34) Let's discuss slides 34-37 as evidence that things vary across schools. Q: knowing this, what is the added benefit of a MLM?

      (*34-37) K1(Osamudia): The following language on slide 15 threw me: “we are going to allow the intercept and coefficient on SES to vary by school.” There is similar language at the introduction of the Rabe-Hesketh & Scrondal reading (p. 1). I realize I don’t completely understand what it means to let either the intercept or coefficient in these models vary; if we could discuss this in more detail, I would appreciate it.

      Response: Yes. What this means is the multilevel models will let each school have its own intercept and its own coefficient on SES. This is similar to what would happen if we estimated separate models by school. (The empirical Bayes shrinkage factor adjusts these school specific intercepts and coefficients to account for small sample sizes). Let’s discuss this when we go over slides 34-37.

    4. models

      (#19)

      (*19) K7 (Delaney): Can we use normal words to describe the differences between each of the MLMs?

      Response: Sure. I’ll do this when I go over slide 19. The basic idea is less complicated than it looks. Let’s take the example of schools: schools differ in their intercept (i.e., the average value of math scores controlling for level 1 variables) and the effect of individual variables varies across schools (i.e., the “effect” of gender will vary by school context). I think that makes a lot of sense--much more sense than forcing the intercept and the coefficient on gender to be the same in every school.

      (*19)K3 (Lauren): Can you talk a bit more about the differences in models 5-7? Looking at the table at minute 8 of the lecture recording.

      Response: Yes, this is where things get interesting. Let’s start by looking at slide 19, then go to slide 41 (the random coefficient model M5, then discuss M6 and M7).

      (Jacob) I would love to go over an example output for multilevel modeling and interpret what each part means – in both R and Stata (Stata seems easier, so I’d like to go over R especially) (MR) (NCH)

      Response: Yes--we are going to go over lots of examples in the lecture notes (with the HSB example).

    5. m4 random coefficient

      (*41) K4(Lily): Similar to what we did last week, could we go through the formulas and use one example (i.e., plug in variable names to the letters)? Could we also talk about when to use these models compared to the others?

      Response: Sure. Let’s jumpt down to model m4 in section 6.2.8 in slide 41. This is a model with a random coefficient for SES.

      (*41) K6(Mia): Can we review the intrepretative/written engagement with summarizing multilevel modeling? So for example in Model 4 we see the variance of ses for schools is 0.68, if I was doing homework or writing a paper what’s the correct way to describe this relationship?

      Response: Yes--let’s discuss this in class when we go over slide 41

    6. Looking

      (#26)

      (McKenna) Can you speak more to the situations in which using growth curve modeling may be most effective? (MR) (NCH)

      Response: Sure--when there is possible variation in the rates of growth over time. I.e., career trajectories, or learning growth. It allows you to model the process of change; life course dynamics.

    7. For the

      (#25)

      (Athena) On page 5 (3.4) what does it mean to “quantify the (co)variability of the intercepts and slopes”?

      Response: since the random intercepts and random slopes are random, it is possible that in the data they might vary in the same (or opposite) direction. These correlations may be substantively interesting: An example is in the growth curve models for babies discussed in the reading (section 4.0.7 of the notes)--there is a correlation between birthweight and the rate of growth. For the school example, there could be a relationship between the random intercept and the random coefficient on gender (i.e., schools that do better might have a smaller gender gap (?)).

    8. ef

      (#17)

      (Alex) In multilevel models, why are cross-level interactions important? Can we review the key variables and subscripts in a model with a cross-level interaction?

      Response: Cross-level interactions occur when we try to explain (using a level-2 variable) why a random coefficient varies across level-2 units. A good example is the coefficient on gender in the analysis in class K. Why does it vary across schools? Add a level-2 explanatory variable, which creates an interaction term between the variable (call it z) and gender, z * gender. If you don’t have the interaction term, then z is just a variable for the random intercept.

    9. e 1

      (#15)

      (show clip on the google doc)

      (Katharina) In figure 3.10, the Empirical Bayes predictions of school-specific regression lines for the random intercept model seems to have all have the same slopes and for the random intercept model and the random-slope model has different slopes. This might be obvious because the one on the right is “random-slope,” but to clarify, does this mean that the Empirical Bayes one keeps the slopes constant to a mean slope?

      Response: Empirical bayes just refers to the process of estimating random effects (i.e., it applies to both the random intercepts and random coefficients). In terms of the difference between the two figures: Yes…the model with just the random intercept forces the slopes to be constant. Only in the model with random coefficients do you get different slopes for the LRT.

      (Georgina) I am a little confused about the j subscript. I realize that we use it to vary across schools but when we have n2jxij it varies but with B3wij it does not necessarily. Why is it that the j subscript remains on the w? Does the j subscript need to be on both letters in the equation for varying across level-2 units?

      Response: Yes, the j subscript needs to be on the coefficient for it to be a random coefficient. I.e., eta2j in Equation 1 below is allowed to be different for each level two unit, while B_3 is the same for all level 2 units.

    10. pa

      (#15)

      (Meredith) How do the equations for random-intercept and random-coefficient relate to the two level equation 2.1 in Luke 2020?

      Response: they are directly related. In Luke 2.1, B_0j is the random intercept, and B_1j is the random coefficient.

  3. Feb 2024
    1. (2).

      Start by putting up equation 2 from slide 16

      (*13)J5. (Delaney) I don't think I understand how to tell when to include/control for a variable vs when to create a multi level model... in the example of the likelihood of couples getting divorced being affected by religion/culture, wouldn't that just mean we would control for religion?

      Response: Yes--good example. Let’s take a situation with 10,000 Catholic and 10,000 Protestant marriages. Treating religion as a level 1 variable, we can estimate differences in the rate of divorce by Catholic/Protestant--we will have good statistical power with 20,000 cases. We can’t, however, pinpoint what it is about the two religions that causes the difference, as we only have 2 cases at the level of religion. Think about this with respect to schools. We might be able to say school k does better than school m, but is it because they pay their teachers more or because the lunch is better? Solution: sample more schools, get more variation in the school-level variables.

    2. Q10

      (#31) The ICC

    3. Q6

      (#27) The error terms in the level 2 equations.

    4. Q4

      (#25) B_0j and B_1j in the level two equations

    5. Q1

      (#22) The ecological fallacy

    6. Q2

      (*23)J8. (Amy) So, the Luke reading lays out conditions under which it’s advisable to use a multi-level model… and then describes the three ways in which anyone decides to do any sort of analysis! The arguments used to justify an OLS model (that variables are monadic and independent of one another) seems to be one that is rarely going to be correct given that, per what we’ve learned in Barbara’s class, there’s always a confounder of some kind. Moreover, it seems like theoretical arguments could virtually always be deployed to defend the use of a multilevel model, but not for an OLS one.

      Response: Yes, anytime we have clustered data or we are using contextual variables then I think we really have a multilevel model. If we are just worried about the effect of clustering on the standard errors (i.e. the non-independence of the error terms), however, we can account for that using robust regression. I.e., using survey weights with Add Health to get population weighted estimates. Multilevel models take this a step further and let the contextual effects become objects of study themselves.

    7. (Equation 2)

      (#16) (*16) J2. (Lauren) Could we go over the big differences between multilevel models and fixed-effects models? FE Models also include the multiple levels, so I’m a bit confused on knowing when to use one over the other.

      Response: Yes--both use multilevel data. For FE models, we are using the data structure to estimate within effects (and get rid of the u_i). For multilevel models, we are using the data structure to estimate the moderating effect of context, i.e., let level 1 things vary at level 2 (and, have that variation potentially be explained by level 2 variables).

      (*16)J6. (Mia) Why are we so concerned with track variability? Can we review why the variance matters as we move between levels and focus within a level?

      Response: Yes---let’s focus on students sampled within schools. First, if our school sample sizes differ, they will have different levels of reliability. Second, evidence on school-level factors is based on how many schools we have. (Discuss by illustrating extreme cases--2 schools).

      (*16)J3. (Lily) There are so many equations and I find that I get confused without an example.On Thursday, could we write out the full equation with the school test score example (i.e., name the equation components based on variables)?

      Response: Let’s focus on equation 2 in slide 16--which is the baseline model.

      (*16) J4. (Lily) You mention that we use the RE model to indicate whether we need to run a multilevel model. Can you explain why this is the case or what these two have in common?

      Response: Yes, I think I can also answer this with slide 16--if there is variation across schools (u_j) then we have a multilevel model. The u_j is a (school level) random effect.

    8. level 1 vs level 2

      (#14) (*14)J11(Rita) Is there a specific sampling design for multilevel regression analysis? Are there any assumptions to take into consideration when looking at multilevel regression?

      Response: Anytime the data is clustered you have a multilevel design. I.e., Add health data is clustered by schools. Longitudinal data like the NLSY is clustered at the individual level. The WVS is clustered by countries. Clusters provide the context→and context, for a sociologist, may actually be the point. (More on this in class).

      (*14)J12(Osamudia): A very broad question as I start the reading–is there ever any debate about how we define the levels in multilevel modeling? Reading table 1.1 of the reading, I found myself questioning whether the various levels were appropriately characterized and distinguished.

      Response: You are right to critique this. The levels depend on how we are conceptualizing context, and that depends on sociological theory. I.e., variation in social structure and context might be a level 2 variable, but the way we define that will affect how we think about the model. I.e., is religion a level 1 or level 2 variable?

    9. syntax

      19 (*19) J10. (Rebecca) How do we distinguish between when to use a multilevel model vs. a hybrid model? How do these differ, given that the hybrid model also utilizes different levels and has clustered or nested data?

      Response: The neat thing is that the hybrid model is built on the random effects framework. The connection becomes clearer when you look at the command in R that we used to estimate the hybrid model. This is line 2 of cb7 in class H, which estimates the hybrid model:

      And lines 2 and 4 from cb4e of class J estimate multilevel models of happiness in the WVS:

      -> they both use the lmer command. I.e., they are both “multilevel” models (including the hybrid model).

      (Jacob) Both readings reference ANOVA. Can you discuss this and how it relates to multilevel modeling? (AO*)

      **Response-ANOVA is “analysis of variance”...what we are doing in the multi-level approach with the null model (no variables, just the constants and the random effects)

    10. empirical Bayes

      17 (*17)J9. (sylvie) Feeling generally pretty lost about the empirical Bayes prediction: what is the statistical motivation for it, and what violated assumption about error terms does it purport to alleviate? More simply, why exactly does this work?

      Response: Good question. First, I would argue that in real life--in terms of how we make decisions-- you are either a Bayesian or someone will gladly take your lunch money from you repeatedly until you have none. The first paragraph of this post (http://varianceexplained.org/r/empirical_bayes_baseball/) provides an illustration that I think applies more generally.

      (Katharina) Page 20 of “Multilevel and Longitudinal Modeling Using Stata” talks about the relationship between the prior, likelihood, and posterior distributions. Where would these estimates show up in how we view the model? What is a “good” number for the posterior?

      Response--the prior and posterior distributions are Bayesian concepts. The prior is our knowledge about the underlying distributions before we know anything about a particular case. The posterior is how we update our understanding of things after we learn some information. emphasize the intuition we do this naturally all the time in dealing with the world. Prior contextual information--this is why voice recognition software is not as good (yet) as humans as parsing speech. (Go humans!).

      Sports gambling would be another illustration. A top team loses the first game of the season. Does that mean they are a bad team? Do you bet against them?

    11. Q7. (G7)

      (#28) (*28) Lily. The statement about treatment factor thereafter was confusing as well. I was also wondering if we could go over figure 2.1 about the outcomes being either intercepts or intercepts and slope. Why are these the outcomes? I don’t understand that either. I also was confused about the concept of ICC being brought in towards the end. Why is it important (I know you mention it in the video as well but I am still confused)?

      Response: Yes, let’s discuss figure 2.1

    12. Why migh

      (#24) J1. (Lily) I have a few questions from the Luke (2020) reading. Firstly, can we go over random factors? I was confused by this portion and think part of that confusion stems from the different usages of the word “levels.”

      Response: Yes--the random factors are anything that is allowed to vary--this happens at both level 1 and level 2. Let’s go over this with respect to slide 24.

      Athena) Can you explain the syntax of the Yij equation on page 12 of Luke 2020? More specifically, what does Y00 mean? (UPDATE: I think it refers to the intercept) In this case, are the subscripts for certain variables e.g. W1j random?

      Response-[This is Q5 in Section 2.2.5] Yes, let’s talk about it in the context of Equation 2.1 first which is discussed in section 2.2.3 of the lecture notes (Q3). .

    13. mo

      (#12)

      Things to talk about at the start of class:

      (Natalie) Could you please provide more information about the midterm exam? What is the intended length of the exam? Will we be dedicating the class period on the 9th to the exam?

      Response--I’ll talk about the exam in class on Tuesday. It will be a take home exam. I’ll distribute it on the 9th, and it will be due on Monday 3/20.

    1. qu

      (#13)

      Show the first page of the article.

      *I10 (Delaney) If you're using a FE model how important is it to compare it to an OLS model in your result? Should we always do that?

      Response: Yes, to see if FE affects the results. If it does, it is likely due to a U_i. (But also not that we have subtracted out some of the signal--so significance may also be lower).

      (Natalie) How do we derive the equations for fixed effects and random effects? How can we use those equations to explain the advantages and disadvantages of the random effects model compared to the fixed effects mode?

      Response: See section 4.2 of class G http://www.tedmouw.info/soc709new/class-G_n.html#4_panel_data_methods Let’s go there for a moment at the beginning to discuss the big picture.

    2. Q8. (G1)

      (#21) Hausman test I9 (Savannah) Is the Hausman test ever used in practice? The concept seems kind of silly to me because it seems like it would be pretty obvious to tell whether coefficients in the RE & FE models are the same just by looking at it. Does the existence of this test imply that there is a certain level of variation in coefficients between the types of models that dictates whether a RE model should be used?

      Response: Yes--slide 21

    3. by adding controls

      (#23)

      (+ open up the FE R results in section 5.3 cb8)

      I4(Mia) In a portion of the video you mention that the fixed effects is an “additive” model and operates as a function. I’m not sure what this means. What does interact mean here? An actual interaction term or a hypothetical association?

      Response: Yes--it just means that the effects are added together (as opposed to interaction/moderating effects)

      l7 (Rebecca) Can we review table 2 in the wage penalty reading? I understand what the coefficients mean but am still unclear on why exactly the OLS model estimate goes down so much after including controls, compared to the FE model. Response: Yes, let’s talk about it in slide 23

      *I5(Lily) I am confused about the question of the quiz that reads, “Adding controls for human capital variables (i.e., education and work experience) lowers the FE estimate of the MWP. This is not unexpected, and indicates that within-mother variation in these variables explains away some of the effect.”I thought it was described that the between-effect was what was really going on here. Can you explain this?

      Response: Some of the effect of having a child can be explained by work interruptions (i.e., differences in experience)...in the FE model, this is a within mother difference. (Note that work interruptions can also be involuntary, which gets back to Acker’s note about gendered organizations).

      (Jessica) In the Budig and England paper, they state on pg. 214 “Controlling for the human capital variables shown in Table 1, reduces the child penalty by 36 percent, from about 7 percent to 5 percent.” I’m a bit confused about where these numbers come from, could you explain how they get them?

      Response: Yes, let’s look at that passage (clipped below) and discuss in class. What they did, though, is add the human capital variables (see Table 1, slide 22), and some of the MWP effect is explained. As they note, this is entirely consistent with human capital theory--work disruptions and lower experience are part of the explanation. However, work disruption could also be involuntary job displacement caused by a anti-mother discrimination at the workers firm (that is revealed after childbirth). I.e., there are multiple possible interpretations of what the effect means.

      (Anna) Other than comparing them to see the impact of omitted variables, is there any benefit to the OLS model in this article? It really seems to be solely for that, but I normally expect a little bit more out of results so im not sure.

      Response: I think so, as a comparison. I.e., if they hadn’t included it, readers (reviewers) would have asked, what about the OLS results?

      (Athena) When interpreting the fixed model effects, why is age not included? I’m also curious what effect age might have when looking at table 1 (i.e. average age of childless and mother in never married). To me, I feel like age might be related to when in the life course events such as marriage and divorce might happen. It’s a study looking at ln of wage, but accounting for that might add another perspective.

      Response: Yes…the models do control for age. Age has a large positive effect on wages (as does work experience), and it is crucial to control for it. Table 1 in B&E presented models that controlled for age but didn’t present the results for age. See the footnote for Table 1, for example: http://www.tedmouw.info/soc709new/class-I_n.html#2110_Q10 Also see my results in section 5.2 of the lecture notes: http://www.tedmouw.info/soc709new/class-I_n.html#5_Analysis_of_NLSY79_data

      Then go to slide 14, but make sure to discuss 24 (Q11).

    4. Mod4.mwp

      (#32) *I6: Can we talk about the map() command in R?? I’ve never seen it before and don’t know how it differs from the apply() stuff in base R. I don’t know if I have anything more specific than that right now lol. (Syl)

      Response: Yes--let’s look at it on the R commands sheet. It allows rolling through 2 lists.

    5. spurious

      (#14):

      Going back to the Q about whether motherhood is random, why does it make the counterargument about spuriousness valid?

      I1 (Lauren). I’m having some difficulty figuring out how we can state our conclusions from FE. For instance, in the motherhood wage gap example, they had 4-5 competing theories that could explain why mothers earn less than non-mothers. Did using FE give them the ability to state that one of the theories was most accurate?

      *Response: I think the key thing is that the FE can rule out a simple sorting/selection model based on U_i. As we go over the paper, let’s keep track of these competing explanations for why there would be a MWP and think about how you would test them. But in general, it is good to have multiple arguments why there could be an effect.

    6. going on

      (#25)

      I3 (Osamudia). Does the use of the word “moderate” in Q12 (slide28) mean impact, as opposed to “ease” or “ameliorate”? Unless I’m misreading the model, I thought that the wage penalty was worse for married women?

      Response: it means that it has an interaction effect, i.e., marital status X child (or running separate models)...i.e., with the occ segregation paper, the effect of occ_fem was moderated by gender.

      After this go to slide 31 (OLS results).

    7. limitations of FE models

      (#19)

      I2 (Osamudia). I still have trouble with questions about how coefficients are biased in different models. Could we go over the excerpt on Q6 (slide 21) addressing when bias is and is not eliminated?

      Response: Yes, let’s talk about this.

      (Jacob) In relation to Question 6 for Class I – “Person fixed-effects models have the limitation that if an unmeasured characteristic affects number of children and interacts with another variable in affecting wages, the models will not eliminate bias.” Why doesn’t the fixed effect model get rid of bias in this case? Shouldn’t unobserved time-constant fixed effects get canceled out? What about the interaction effect changes this?

      Response: Yes, this was a good point to bring up. The FE model controls for the additive effect of U_i, but not an interaction effect with other variables.

    8. key variables

      (#37)

      Let's inspect the code for the plm FE model here.

      (McKenna) In the HW6 code, what does “index=c(“id”, “year”) refer to?

      Response: This tells R the structure of the (longitudinal) data; “id” is the i subscript and “year” is the j subscript. I.e., the model will calculate level 2 means based on id. Level 1 is id and year.

    9. (G6)

      (#31): G6: Preethi & Alex

      Let's look at what is in ols2.mpw and ols3.mpw

      What is the effect of number of children? (nchild).

      I.e., in this analysis, the OLS results indicate no MWP. Is this true? Turn to FE models with career trajectories.

      After this, go to slide 37 (Q14, section 5.3)

    10. observations

      (#22) Discuss the importance of including the "human capital" variables.

      There are differences among the groups.

      Then go to Q10 slide 23.

    11. marital status

      (#20) Let's read over this section after the first highlight carefully. Why does it suggest that the MWP is not constant over time periods within marriage in the U.S. or across different types of couples (open question)? I.e., bargaining, power differentials, "performing" gender.

      After this, go to Q12.

    12. job

      (#15): after this, skip to Q5 (slide 18)

    13. ns

      (#13) Big picture: Why are we doing this? (Put Table 2 up in a different window)

      Specific for this topic (MWP): Theory: is motherhood random? (No, why not?)

      Draw a career trajectory and illustrate the benefits of FE approach.

      Broader issue: much of our data is clustered.

      (Alex) When working with multilevel data, does level 2 always represent a broader context compared to level 1, or is that up to the researcher’s discretion/structure of the dataset?

      Response: It just has to do with clustering; i.e., level 1 is clustered into level 2 units.

      (Preethi) What would be the pros/cons of using a random effects model rather than the fixed effects model in this scenario?

      Response: A random effects model wouldn’t control for the possibility that U_i is correlated with childbirth; i.e., that there could be selection effects.

      [To frame a critical take on it:] (Braxton) So the value of a FE model is that we can have two distinct years to compare the motherhood penalty, like two snapshots in time that control for all of the constant/unchanging things. This is unlike OLS where there is likely more omitted variable bias and it’s just at one moment. If there are variables that do change over time that FE models aren’t taking into consideration, then is a FE model actually much better? I ask because I’m not super convinced reading the paper that their FE model took everything into consideration, especially since they assert that all the other research before them missed things here and there. And the OLS results are slightly more negative but not very different.

      Response: Yes…it is really important to think about what is changing over the time period…those variables need to be in the model. In fact, you (collectively) will find out what happens if life-course/career variables are left out of the model (i.e., age and work experience).

      Jump to slide 22 (Table 1) then slide 23 (Q10).

    1. discusses

      (#15) Also, do you know of any diagrams or images that explain or display level 1 and level 2? I keep confusing within/between effects and it makes it hard to follow the conceptual parts of the model.

      Response: Here is an example of a three level model: (see the google doc)

      in terms of the within/between effects, the description depends on what the levels actually are. The higher the level, the more macro it is. In this case, a particular pregnancy is level 1, and mothers are level 2. In the NLSY data, level 1 is person-year, and level 2 is person. In the World value survey data, level 1 is person, and level 2 is country.

      H2 (Lauren) More of a clarification question, but it looks like Level 1 variables are variables that vary over time (have a t subscript) while Level 2 variables are variables that do not (only have an i subscript). Is that right? Also, I am still confused on the discussion of looking to see if the coefficients are the same on the B1 and B3 variables.

      Response: the key thing is that level 1 is nested within level 2. Person-time within people. Or, individual within country. Student within school, etc.

    2. exciting thing

      (#18) H1b. Lastly I was wondering if we could go over this statement in the conclusion, as it seemed pretty important to me (but I could be very wrong), “It allows us to estimate the effect of level 2 variables while providing effect estimates of level 1 variables that are unbiased by a possible correlation with the level 2 error.” (Osamudia)

      Response: The within estimates (for example of age) will be free of bias due to u_i, because they are the FE estimates.

      (Anna) They threw a lot of equations at us in this reading, and I really am not an equation minded person (the variables get all mixed up in my head and they are hard for me to decipher) can you say which ones are most important for this week and maybe briefly explain the math?

      Response: I think of the math here as the way of demonstrating the connections between the different models. You can put that in a mental footnote and try to focus on working on an intuitive understanding of the key things: Within variation : Changes (over time) within level 2 units (mothers in this case). I.e., changes across pregnancies for the same person. Between variation: differences across mothers, where the variables are the average of the mother’s variables. For something like mother’s age or smoking behavior, this is the average at level 2. Then focus on why this is exciting: we can see, in the same model, whether within variation has the same effect as between variation. Circle back to why this is really important (it is telling us something about what is going on with fixed unobserved characteristics).

    3. (G1)

      (#22) Refer to google doc *H4 (Rebecca) Could you go over the part from the reading on interactions that says a correct estimation can be done through a benchmark? I wasn’t sure if this was a STATA-specific issue and how relevant it is for our class.

      Response: Are you referring to this use of benchmark on p.72? Model 1 is the “benchmark” FE model, so we are comparing the results to it. (Let’s discuss).

    4. Q8.

      (#22) H8(Osamudia)--I’d appreciate going over some of the drawbacks as described in the conclusion–what does it mean that “within-effects estimates obtained through random effects are not as efficient as those obtained from fixed effects models”?--Do these hybrids necessarily use fixed effects? And what does “efficient” mean in this statement? Related, what is the benefit of these hybrid models other than giving you an indication of how much of the observed effects are due to unobserved heterogeneity?

      Response: In this context, efficient refers to making the best use of the information to improve the precision of the estimates (i.e., reduce the standard errors).<br /> The within estimates are the FE.<br /> The benefit of the models is that we can estimate the z_i variables as well as include the FE estimates of the x_it variables.

    5. Q9

      (#31) *H10 (rita)- Can you go over the example analysis of within and between composition, lm4 package, and the within and between effects?

      Response: Sure. Let’s discuss this starting with Q9 in the notes (the within and between decomposition in the data)

    6. So

      (#14) (Alissa) In the HW 4 answer key, for Q2.4 you write “In this model, men have a larger negative effect as sex.ffemale:occ_fem is positive.” Can you explain the reasoning behind this? I understand what the positive interaction term means for women, but I am confused as to how we can extrapolate this result to mean a negative effect for men? (MR, GG)

      Response: in mod.fe1, men are the excluded category for gender, and we have an interaction term between gender (female) and occ_fem. What this means is that the baseline effect of occ_fem in mod.fe1 is actually estimating the effect for men (-.211). The interaction term sex.ffemale:occ_fem tells us how the effect of occ_fem differs for women compared to men. (I.e., importantly, it is not the overall effect of occ_fem for women, just how it differs). →The effect for women is the baseline effect (-.211) + the interaction term (which is how the effect differs for women compared to men, +.020) = -.191.<br /> →The effect for men is the baseline effect (because they are the default category) = -.211.

    7. is the level 2 error

      (#16) (Athena) Through a simplified lens, is there a way to understand when we need to use a hybrid model? Would it be based on the unit of analysis of one’s experiment or elements of a “DAG”? Is there a flowchart that can help one understand when to use each method (FE, RE, Hybrid)?

      Response: In terms of a better approximation towards the truth, always (start with a FE and compare to RE). Any time you have clustered data you should run a FE model as part of a check for robustness (let’s discuss why). Clustered data = individuals within larger groups (i.e., Add Health or the WVS) or longitudinal data (i.e., the NLSY analysis on the effect of occupational segregation on wages, or the motherhood wage penalty (next class)). <br /> The hybrid model allows you to do both at the same time--i.e., a “normal” between model and a FE within model.

      (Preethi) How would you address potential correlation between a level 2 variable and level 2 error?

      Response: That is harder---we are back in omitted variable bias territory and thinking about IVs.

    8. key findings

      (#22). H9 (Savannah)--Can we go over why the coefficients on smoke for Models 2, 3, and 4 in Table 1 (Q8) are the same?

      Response: Yes, it’s because all three are estimating the within effect of smoking.

      (Georgina) If we are thinking about the estimated effects on birth weight of smoking during pregnancy as a within-mother effect, how exactly does this work? Are we only looking at moms who had multiple pregnancies? Does the within effect always matter?

      Response: It’s just mathematical, in the sense that if there is variation in the smoking behavior across pregnancies for the same mother, then that variable will have within-person variation--it is a fixed effects (“within”) estimate. Only moms with multiple pregnancies will be part of the within estimate. The within effect doesn’t always matter---in the case of an effect driven entirely by a unobserved heterogeneity (i.e., an unobserved factor U_i) the within-effect could be 0 and the between effect could be large (in either direction).

    9. structure of the

      (#21) (Meredith) In the video lecture (14:18) and page 68 of the Schunck reading, if infants are considered level 1 nested within mothers (level 2), why are the variables for mothers smoking and mothers age said to be “level 1 variables” and race said to be a “level 2 variable”?

      Response: I think the way to think about it is level 1 are variables that vary with each childbirth--this can be characteristics of the child or the mother during the pregnancy, and level 2 are mother characteristics that are fixed over different pregnancies.

    10. random slopes

      (#19). Can we go over models with a random slope?

      H1. (Lily) I have a lot of questions, so apologies in advance! I was first wondering if we could go over the distinction between level 1 and level 2 variables. There were also mentionings of a “random slope” in the reading, which I did not understand what that was getting at.

      Response: the random slope allows a coefficient on a level 1 variable to vary across level 2 units. So, for example, the effect of age (level 1) could vary across people (level 2). Or, when we analyze the WVS, the effect of gender (level 1) could vary across countries (level 2).

      Response: Yes--that just means letting the coefficient on a level one variable vary across level two units. We’ll talk a lot about this when we discuss multilevel models. An example would be with WVS data, allowing the effect of gender to vary across countries.

      (Jessica) Mathematically and theoretically, what does the “random slope” in the hybrid model represent, and why is it better to include it in a hybrid model vs. a correlated random-effects model?

      Response: It just means that we add a random term on the coefficient itself with a level two subscript to allow it to vary (in this case) across mothers. When we cover multilevel models in classes J, K, K2, and K3 we will see lots of examples of random coefficients (=random slopes). Allowing the coefficient to vary at level 2 allows us to consider how level 2 variables moderate the effect of the level 1 variable (and how much underlying variation there is in the level 1 variable across the level 2 contexts). In the case of the WVS data, we will allow the effect of gender, for example, to vary across countries, which will make a lot of sense.

      In terms of the correlated random effects model, Schunk was just noting that because of the slightly different specification the random coefficient will mean something different---not that you can’t do it but to be aware of what it will pick up.

    11. comes at a cost

      (#17)<br /> (Katharina) On page 4 of the document (pg. 66), “But this comes at a cost. The subtraction also removes all variables that do not vary at level 1.” To clarify, level 1 would be the more macro level, such as classrooms and level 2 is individual students in the classrooms. Thus, if all classrooms are, for example, the same temperature and same square footage, those variables (temp and sq. ft.) would be removed in equation 2? Then they say “Fixed-effects models therefore cannot estimate the effect of level 2 variables.” So is this saying that for observed things that do not vary (such as temp. and sq. ft.) that they cannot be estimated or used to estimate, but that the fixed effect, u, still holds unobserved variables (at level 2) as fixed? (i.e. can you clarify this part of the article please?)

      Response: level 2 is the higher level (i.e. more macro level). So, if level 1 is person-time, then level 2 is person. FE can’t estimate level 2 variables in this context (things that don’t change within people over time).

    12. Q4

      (#18) Why is it exciting?

      H1b. Lastly I was wondering if we could go over this statement in the conclusion, as it seemed pretty important to me (but I could be very wrong), “It allows us to estimate the effect of level 2 variables while providing effect estimates of level 1 variables that are unbiased by a possible correlation with the level 2 error.” (Osamudia)

      Response: The within estimates (for example of age) will be free of bias due to u_i, because they are the FE estimates.

    13. discussion questions

      (#14) Let's start here, by describing the big picture

      H7 (Osamudia)--I would appreciate going over the difference between level 1 and level 2 variables–are the former those that vary by person over time (e.g. age), and the latter those that vary across but not within individuals (e.g. race)?

      Response: Yes, we will definitely talk about this.

    14. Q2. (G2)

      (#16). Assumptions about U_i for RE

      H5 (Rebecca) Could you also explain the notation of (μi | xit, ci)? I understand the assumption of normal distribution (N(0, σ2μ)), but I’m still not quite clear on what the | entails.

      Response: it means conditional on. So, U_i conditional (controlling for) x_it and c_i.

    15. Q3. (G3)

      (#17). H3. Can we go over this statement from the quiz? What is wrong with it? “A drawback of the FE model is that we can’t estimate the effect of any variable that doesn’t change within clusters. This biases our results from FE models.” (*)

      Response: Sure. Trick question. It is a drawback of the FE that we can’t estimate Z_i variables, but it doesn’t bias the FE estimates.

    16. Q1

      (#15) The motivation behind the approach

    1. Fixed effects

      (#16) *G10: (Syl) I’m gonna re-ask questions from last week that I wasn’t sure about: Do FE models still have error terms attached to them anyway? What do we make of the residuals of an FE model–are they just expressions of the accuracy of our estimates re: our independent variables?

      Response: Yes, they certainly do have error terms. The fixed effect means that we are fitting the person-specific mean perfectly (that is what the fixed effect does). So we are “taking out” any concern about cross-person variation (discuss). The residual for each case indicates how we are imperfectly fitting variation around the person specific mean at a particular time for a particular person.

    2. Q6 twin study

      (#26) Twin studies, and clustered data in general. We can use FE with clustered data. We subtract the cluster specific average for each variable.

    3. q1-3

      (#28) EFKD article.

      Let's go over the logic of the diagram here and the motivation for the FE model.

      *G4. (Lily) In the reading, they mentioned that the instrumental variable for women was the predicted probability of employment. Are we able to go over their rationale for that at all (I know this not the focus of this week)?

      Response: This is an excellent question! This is on p.549 of the article. Note that theoretically, they some variable to act as the “instrument”, i.e., something that affects employment but not wages (refer back to the discussion of the IV approach in class C). (Unless they are relying on nonlinearity to identify the model, which in my opinion is problematic). Let’s look at the variables they use:

      Which of these variables aren’t in the regression models? Number of children under 6 and husbands annual earnings (?). I don’t agree that these are good instruments, as they are likely to affect occupational choice and earnings.

    4. OLS, FE and RE models

      (#52). Let's jump down to this Q early. G6. (Osamudia) Especially after taking a look at the upcoming HW, I’d like more practice reading the model results when running a FE model. It’s not clear to me what the coefficients are conveying.

      Response: You read it the same as an OLS model, except you have to keep track of how the variables have actually changed. I.e., education is now the difference from each person’s average. I.e., you got more education, are you earning more? Let’s look at the results in section 6.5 and discuss (note that hgc is highest grade completed):

      *G8. (Lauren) How do you read the regression table differently in a FE model?

      Response: See my response to question G6 above. This is an important question, and let’s make sure that we go over it verbally in class.

    5. q13

      (#38) results from EFKD G7. (Rita) Can you please explain further about the ‘experience’ variable in England’s article. In table 4, we have two “experience variables,” one with weeks and the other with nothing. Also, the author mentioned that the measure of experience was only available from the beginning of the NLS survey. Does it mean that they didn’t measure over time, and how did that affect the fixed effect model?

      Response: Yes. The second measure is experience squared. Regarding the effect of experience, see footnote 5 on p.549, and we’ll talk about what it means in the context of an FE model:

    6. QA2

      (#19) Arbitrary correlation

      *G5. (Rebecca) Could you provide an example of a situation where ui/ai wouldn’t be correlated with explanatory variables? I understand when to use random effects vs. fixed effects in theory, but it would be helpful to see a real-life (or hypothetical) scenario where random effects models could be used.

      Response: Something random, like random fluctuations in weather, or how your city’s sports team did (even then it could be questioned).

    7. panel data methods

      (#14) Start here.

      (Lily) With all these different types of models, how do you know when the model you have selected is the best fit?

      Response: Good question. OLS, RE, and FE models have different baseline levels of fit, so they aren’t directly comparable in terms of R2. In the FE models, we are getting rid of much of the signal (the between effects--differences in cross-person averages), so the R2 will be lower.

    8. Q5 RE or FE?

      (#25)

      *G2. (Savannah) Is the easiest way to determine if a fixed-effects or random-effects model is better for your data just running both models and seeing if there is a difference? For example, if both models yielded the same results, could we comfortably rely on the random effects model?

      Response: You can eyeball it to see if they are different. If you need a test (i.e., to convince a reviewer), the Hausman test provides a formal evaluation of whether the FE and RE results are the same. (phtest in the plm library). https://libguides.princeton.edu/R-Panel#:~:text=If%20the%20p%2Dvalue%20is,to%20use%20fixed%20effects%20models. In the hybrid models in class H we will do both within (FE) and between (RE) at the same time.

      *G3. (Delaney) ^ Yes can we talk more about using different models and how to know which one is the best? I think it's interesting that FE models are better almost all the time but we learn about other models too.

      Response: With many problems, FE are definitely a step forward. However, when we consider multilevel models we will be dealing with a whole class of models that are built on a RE framework, so we will be considering lots of cases coming up where we use an RE approach. Also, we will cover hybrid models that use both RE and FE to look at within and between estimates.

    9. Q4 RE model

      (#24) How is RE different from FE?

    10. Random effects

      (#22) Random effects

    11. Q0.1

      (#18) The within transformation

    1. q0

      (#16). Distinguishing the two types of error terms. Focus on the subscripts.

      F10 (Delaney): are a_i and u_i entirely theoretical or can we get physical numbers for those? Are we just trying to think about whether u_i is correlated with x and z? Does it matter because those get dropped out of the FE equations anyway?

      Response: They are entirely theoretical. This is back to the world of class C and omitted variable bias and the role of theory.

    2. big picture

      (#14). Start here.

      F3. (Osamudia)--I would benefit from a very practical, real-life example of the problem we’re considering and addressing with panel data methods.

      Response: Yes, we are about to devote a whole class to the motherhood wage penalty estimates. We will use a fixed effects model to estimate the motherhood penalty by comparing women’s earnings before and after they have children (compared to those who didn’t have children). This is because cross-sectional data may be misleading (due to selection effects into parenthood).

      F6. (Savannah)--It would be helpful to go over a real world scenario where we would use a random effects model. Wooldridge says it can be useful if the key explanatory variable is constant over time, but what is an example of where this is the case? How common are RE models in practice?

      Response: Yes. We are about to cover multilevel models. These are all based on a RE approach as the basic framework. We are also going to cover hybrid models which allow us to use fixed effects for within person differences, and random effects for cross-person differences.

      F7C. (Mia). Also what even is a fixed effect like what is stagnant enough that we don’t expect it to change overtime?

      Response: Examples might be personality characteristics or specific aspects of facial features not measured on surveys that relate to the lived experience of social distance (let’s talk about this in class).

      F11(Rita)- What is the difference between the fixed model and first difference models. I am a bit unclear. Also it mentioned that the equation will have to include a dummy variable, I am wondering why only dummy variables

      Response: The FE and FD models are very similar. The FE model can also be estimated with individual dummy variables for each person---it turns out that it is the same thing as doing the FE transformation.

    3. Fixed Effects

      (#24) Fixed effects.

      *F2. (Lauren)-I’m having a difficult time distinguishing panel data methods from a diff-in-diff. I know we haven’t done diff-in-diff, but could you highlight the main differences?

      Response: A fixed effects model is almost identical to a diff-in-diff; the key thing is the effect of time, and how it varies by those groups that experienced a change in the IV versus those who didn’t. If we have two time periods, and we let the effect of time 2 vary by those who got the IV change (i.e., by interacting time with the change in the IV) versus those who didn’t, then we have a diff-in-diff.

      F5. (Amy)--I feel like I grasp what the fixed effects approach is, but I’m struggling to answer Q4 on the lecture quiz confidently. Why is it that this works to remove unobserved effects?

      Response: when we do the fixed-effects transformation, anything that is constant drops out. The U_i term (which is the constant part of the unobserved variables) is subtracted out.

      F8. (Syl) If the error term’s variation over time might challenge the assumptions of the FE model, why use it as opposed to FD models, which can work in spite of this? IS there something about the FE model that makes the estimates more precise? Don’t FE models still have error terms attached to them anyway? What do we make of the residuals of an FE model–are they just expressions of the accuracy of our estimates re: our independent variables?

      Response: The FD has the same weakness, which is that if changes in the error term are associated with X then we have a problem of omitted variable bias. But it does give us a different window/angle on causality.

      *F9 (Rebecca) Could you clarify how to decide whether to use FE or FD? My understanding is that in general maybe both are used to compare results, but I wasn’t quite clear on that section of the reading.

      Response: It depends on the situation. In general, FE uses more complete information (i.e., it might get more precise/significant results). A key question is whether there is serial (temporal) correlation in the error term. We can also estimate FE models that allow for this (autocorrelation) of the error term over time.

    4. basic intuition

      (#36) R example of FD

      F7. (Mia) Can we go over the intuition of the R code where we know that death rate for time time is -2 for the states with open container laws but then we when regress it we get a 1.82**?

      Response: This is a question from the cb_basic-intuition in section 4.2 of class F. http://www.tedmouw.info/soc709new/class-F_n.html#42_basic_intuition The 1.82 is the estimate we would get if we mistakenly ran an ols model on the data in lin 54, as shown in the stargazer command in line 56. The reason this is the wrong model is that the states that adopted an open container law were a selective sample. In lines 21-25 they are the 30 states with the highest death rates.

    5. lag operators

      (#39)

      *F7B. Also can we review the concept of what creating a lag does in R and how that creates the FD model?

      Response: Sure, great question. I’ll talk about it in class. Lags are discussed here: http://www.tedmouw.info/soc709new/class-F_n.html#431_lags_in_r

    6. overview

      (#13). F4: (Osamudia)--what is the difference between first-differencing and fixed-effects?

      Response: they are very similar. First difference just looks at successive observations, while fixed effects compares individuals to variation around their individual-specific means.

    7. q4

      (#20). How can FD fail?

    8. q1

      (#17). Why is a_i the fixed effect?

    9. The First-difference (FD)

      (#19) F1. (Lily) – a_i is confusing me a little bit. Going over 13.20 equation would be helpful for me. Additionally, can a_i be quantified? How do you calculate something that is unobserved? (Osamudia)

      Response: Sure. We will go over 13.20 in class.

    1. q6b

      (#48) Start here for class E2

      E1: E1 (Lily) for class E2: How big of a role does theory play in identifying multicollinearity? If the VIF is borderline (for instance), does theory play a role in deciding whether there is multicollinearity at all?

      Response: The VIF helps to diagnose when you might have a problem of multicollinearity. It is really just a mechanical (non-theoretical) thing; if your explanatory variables are very highly correlated you will have less independent variation to estimate your coefficients. Note that it is only a problem if it prevents you from getting significant results.

    2. Robust regression

      (#35). E6. (Rebecca) Could you clarify what you meant at the very end of the lecture video when you said that a higher standard error makes it less likely that your statistical test will be passed? Are you referring to a lower chance of statistical significance of variables in your model? Relatedly, when you use a robust standard error, are you correcting for heteroskedasticity at the cost of potentially lower statistical significance?

      Response: Yes. We base our significance tests on the p-values. (When the number of cases in your model is ~>50) These are derived from the z-scores, which is the coefficient divided by the standard error. A z-score > 1.96 means that p<.05. If the standard error goes up, the z-score goes down. For robust regression: Yes--if you apply it unnecessarily.

    3. q14

      (#59) add more cases

    4. q13

      (#58) What to do (add more cases).

    5. q11

      (#56) what to do?

    6. example 2:

      (#55) multicollinearity and rising standard errors

    7. example 1:

      (#54) an example of declining standard errors (no multicollinearity)

    8. q10

      (#53) rising standard errors

    9. q9

      (#52) signs of multicollinearity

    10. q8

      (#50) Explain what the VIF is.

    11. q7

      (#49). VIF formula

    12. heteroskedasticity

      Big picture: why are we doing this? Do we believe our results?

      (#18). Let's distinguish between accuracy and precision. Accuracy means on average--what is the average point. Precision refers to the degree of spread around the average point.

      Let's consider using a random sample to get the average math score for a school. What happens if the sample size is 1? Will it be biased? Will it be trustworthy? Why do surveys give an estimated error range? If we have a random variable x (i.e., math scores), the variance of a the mean of a sample of size N is var(x)/N . This is as "real world" as it gets---we shouldn't trust information that has a high variance even if it is correct on average.

      William Tell --> both precision and accuracy matter.

      E5. The image on slide 18 about accuracy and precision have always thrown me–why is precision not just a form of accuracy? Does “precision” here actually mean “consistent”?

      Response: Great question. Precision and “consistent” are different. You can throw the darts very precisely to the wrong place on the dartboard (i.e., not the bullseye). (In class we’ll talk about trusting someone to shoot an arrow at an apple on one’s head). In contrast, you can throw the darts in a diffuse cloud that, on average, is centered on the bullseye--this would be unbiased..

    13. q5

      (#29) Good details about the robust approach

    14. q4

      (#28) solutions, robust standard errors

    15. q3d

      (#27) the BP test.

    16. q3c

      (#26) How do we detect it?

    17. Heteroskedasticity

      (#20). E9. (Rita)- If I am wrong; you mentioned that heteroskedasticity mostly happens when you are working with longitudinal data or across different states. So does it mean without such data will not encounter that. Also, at what stage of your research analysis do you have to check for that.

      Response: heteroskedasticity can happen in any data set. You check it as you are running your models. Note that (with OLS) you can always “solve” it by using robust regression.

    18. q3

      (#24) a visual illustration

    19. q3b

      (#25) detecting heteroskedasticity

    20. q2

      (#23) aggregate data

    21. does not bias

      (#22). E10. (Lily) - If heteroskedasticity does not bias the coefficients, why is it important to address?

      Response: it affects our inferences about the coefficients (i.e., whether or not we believe them to be greater than or less than 0).

    22. error term

      (#18). E8. (Savannah) It seems like sometimes the terms “residuals” and “standard errors” are used interchangeably, but I know they’re not the same thing. For example, when we test for heteroskedasticity, we square the residuals. Is this not the same as squaring the standard errors?

      Response: The standard error refers to the estimated error term of the coefficients. The residuals are just the difference between the actual and predicted values of Y. They are connected: the larger the sum of the squared residuals (divided by the number of cases - the number of estimated parameters) the larger the standard errors.

    23. q1

      (#22) defining heteroskedasticity.

      Go over the Gauss-Markov assumptions.

      E1. (Lauren) Could you explain again why heteroskedasticity and multicollinearity do not bias the coefficients?

      Response: Yes. We’ll go over the Gauss-Markov assumptions about the error term for OLS (see the above link). The key point is that the estimation of the coefficients doesn’t involve assuming anything about homoskedasticity or multicollinearity. They both affect the standard errors of the coefficients, but they don’t bias the (on average) point estimate of the coefficients. I’ll show examples using R that demonstrate this.

      E2. (Osamudia): Is inconsistent variance of standard errors problematic because it means the model is less consistently predictive?

      Response: It depends on what you mean by “consistently predictive”. It doesn’t mean that the actual model is wrong per se. It just means that our inferences about the model (using the estimated standard errors) may be wrong. I.e., our model for the error term is wrong, not our model for the point estimate of the coefficient. This is different than omitted variable bias (due to an omitted confounder--in that case the model is wrong).

    24. multicollinearity

      (#19). If you have multicollinearity, your model is still correct, you just need more variation in your X's. Add more data. Any significant effects you find are still correct.

      Having correlated X's is why we use multiple regression.

  4. Jan 2024
    1. Co

      (slide 13) 1-30-24: start here, slide 13

      Brief recap & overview

      key point: you want to make sure that your analysis is robust (to the inclusion/exclusion of a small number of cases and alternative specifications).

      influence = leverage x residual

      influence --> DFBETAS, Cook's D

      leverage --> the Hat value

      residual --> Y minus the predicted Y. --> better = studentized residual

      D1. (Lily) How do we know which methods to use when testing for leverage, outliers, and influence (i.e., for leverage it seems like there may be more than one method. Which do we pick)?

      Response: The key thing is influence; use the dfbetas and cook’s d. Outliers = residuals, use the studentized

      D14 (Amy) I’m not sure I quite understand the genesis of influence itself–I get what it is and what it does (which is obvious from the name alone) but if,, as in the Fox reading, influence is a product of a regression outlier and the leverage it exerts, how is that a different property than leverage?

      Response: Yes, we’ll discuss this in slide 13

    2. q8 influence

      (#24) q8

      D11 (Savannah) - I am confused about the logic behind the DFBETA approach (like what exactly are we doing when we use that equation?). Could we go over this again (but explain it to me like I’m a fifth grader LOL)?

      Response: Yes--it is the effect of 1 case on 1 coefficient, which we discover by excluding that case and rerunning the model.

    3. q7

      (#23) Q7

    4. q5

      (#20) Go over this figure.

      The key point is that the miscoded cases (reported weight=60) is an outlier in both graphs, but it has a large leverage in (b) and a smaller leverage in (a).

      influence = leverage x residual

    5. q4

      (#19) D2 (Osamudia): let’s please run through the model on slide 19, under Q4, as well as its graphical representation on slide 20. The theoretical concept makes sense but I’m having trouble locating the elements of this concept in the model.

      Response: Yes, let’s cover this in class

    6. fox p.268

      (#17) D11 (Savannah) - I am confused about the logic behind the DFBETA approach (like what exactly are we doing when we use that equation?). Could we go over this again (but explain it to me like I’m a fifth grader LOL)?

      Response: Yes--it is the effect of 1 case on 1 coefficient, which we discover by excluding that case and rerunning the model.

      *D12 (Rita) - Can you further explain what the influence observations actually does, using the graphs in Gordon or fox. Also, what does it mean by specifying your models correctly to to prevent influential observations.

      Response: Yes, we will discuss this in slide 17

    7. Q3.

      (#18) D8 (Lauren) Is even one outlier too many? Wondering if there is ever a context we can ignore the outlier(s) rather than adjusting for them.

      Response: You will always have outliers (by definition). You should consider whether they are flagging data errors or something missed in your theory. But you shouldn’t adjust for them or drop them (i.e., in my mind that would be scientific misconduct)...unless there is a clear indication of data entry errors, or you can find why your model is misspecified.

      D9 Syl: Is it disingenuous/bad research practice to drop influential points from our analysis when we find them? To me it seems “hack-y” Response: see D8.

    8. q1

      (#14) q1

      D7 (Rebecca): What does log transformation actually do? I read about how it is used to normalize distributions a lot in stats (thought about it specifically because it was mentioned in 14.1.2 of Gordon) but I still don’t quite understand how/why it works.

      Response: it’s an alternative functional form for an X (or Y) variable. I.e., how we paramterize variables is up to us, and it should be related to what fits best. log(x) measures proportional effects in x→it’s a nonlinear functional form. (Discuss). Having x and x-squared also gets at nonlinear effects. If either of these fits better than a linear effect of x, then the data is telling you something about what is going on in reality. (The pattern of residuals should reflect this).

    9. Cook’s D

      (Slide 25) Let's look at Figure 11.10 on p.285 of Fox

      D4 (Osamudia): slide 25, if step 1 isolates the variation in y that can be explained by other variables, and step 2 identifies the variations in x1 not explained by other variables, when graphing the residuals, why are the variations in y not explained by other variables plotted on the x-axis? I replayed this section of the lecture several times and believe that’s what I heard, although if I misheard, do let me know.

      Response: It’s at minute 26:16--the residuals from Step 1 are on the Y axis, and the residuals from step 2 are on the X axis.

    10. es

      Hat values (slide 22)

      The easiest way to write the multiple regression coefficients is the equation for b = (x'x)^-1(x'y). The hat value is the x part of this equation. In words, this is the same thing as the hat value--it is a measure of the leverage of the x's.

    1. Q17

      Q17 (second stage)

    2. Q15

      Q15, 16 (first stage, slide 37)

    3. Q1

      Question: if we leave hlthlimit out of the model, does it become an omitted variable that biases the estimate of the coefficient on health?

      follow up Q : is the answer to this question based on "theory" or factual knowlege?

      1-25-24: Let's repeat questions 1-3 but use the example of education and motivation/ability from Wooldridge p.507. (note: this really is Q4, let's jump to that question).

      Let's also take another example: the effect of marital status on happiness. (Let's use this as a group discussion question later in class).

    4. ba

      1/25/24 question sequence:

      Q1-3

      Q4,

      Section 2.2.6 Instrumental variables,

      Q 5-8,10-11

      2sls example (section 4.1): Q15-17

    5. q2

      I think this table is really important to memorize.

      C3. (Sylvie) Are tables 13.10 and 13.11 both “multiplicative?” I don’t know why, but I get the feeling that table 13.11 is “additive/subtractive” but I can’t tell if this intuition is correct. Can we go over them together?

      Response: (clarify) → the key thing is the sign. - - = + (by multiplying the signs, which is a good way to remember the table).

      C8. (Delaney) Not a super crazy in depth question, just thinking about why it doesn’t matter that we can't quantify how much omitting a variable changes our results, it just matters in which direction (towards or away from zero). All theoretical, but could we talk about why this is?

      Response: if we don’t actually have a measure of the variable, trying to make an argument about the direction of the bias is likely the best that we can do. Sometimes it really does work (in terms of making a persuasive case about why you have estimated a conservative estimate of your key variable).

    6. q1

      C5. (Mia) I’m wondering about the logistical process of deciding when OLS is appropriate versus when there is reasonable determination of omitted variable bias. When a researcher develops a question what is the process of ensuring there is no omitted variable bias? Does it differ based on secondary or primary data?

      Response: unless you have experimental data, you will have some form of omitted variable bias. The only real question is how much and whether you can live with it. One of the things we will do this semester is look at how different types of data (longitudinal data and multilevel data) can give us a different angel on what might be going on with the data (and omitted factors and sources of variation). The key upshot is that the conversation about what the “true” effect is is never really done, and can always be improved on. (added to Q1)

      C7. (Amy) Using theoretically-backed intuition to figure out the nature of omissions sounds great (really, totally my thing) but for all the hype quantitative analysis gets for its concrete empiricism, it seems a little ‘wishy-washy’ – is there any kind of testing or analysis that can be conducted to help us reach, if not certainty, at least satisfaction that our variables account for all major elements of a construct?

      Response: you can come up with a good research design (to give you some sort of experimental or quasi-experimental variation). However, if you really have an omitted variable you are only left with some sort of theoretical argument (about what is going on with respect to the effect of the omitted variable on your results). It is the revenge of the theory people. No amount of mathematical expertise can get you out of it. (The problem is that you don’t have data on the omitted variable; if you did, you would just put it in the model).

      Instrumental variables offer a solution for omitted variable bias, but they just shift the focus of the theoretical discussion to a discussion of what is going on with the instrumental variable. If you have a (theoretically) good instrument, then great. Most of the time, it just creates arguments. (When I was in graduate school at Michigan, the econ department wouldn’t let grad students write dissertation papers that included instrumental variables).

    7. q11

      Q11, then go over Q15-17

    8. q10

      Q10

    9. q6 and q7

      Q6,7

    10. q8

      Q8

    11. q5

      there are other critiques (social class...how would that work?)

    12. mu

      This is the regression equation: Y = 2 + 1x + 3w + e the true effect of x is 1

      Discuss C6. C6. (Savannah) I think I need a refresher from last semester on what an error term in a regression equation means. Like in the example below:

      Response: the error term is the collection of all things that aren’t in the model and affect the dependent variable. So, there are lots of potential omitted variables in there (factors that are correlated with x and affect y).

    13. in

      C2.(Rebecca) Does including an instrumental variable differ mathematically from just incorporating additional variables in a multiple regression? I’m not sure I understand how it conceptually differs from other independent variables in a multiple regression, aside from the fact that it is trying to account for an unmeasurable omitted variable.

      Response: Yes it does differ. When we use an IV, we begin with the first stage regression, which estimates variation in X that is due to variation in Z (the omitted variable). We need to find a variable Z that affects X but isn’t correlated with the error term e. Then, we leave Z out of the regression equation (except for its effect on the instrumented value of X). This is the exclusion restriction, and it is a theoretical question. If you have a good instrument, it really does work (but with real data, it depends on theory for justification). (added to instrumental variable in the slides)

    1. q4c

      Start with Q4c, then 10, 2, 11, & 13

    2. Q11.

      I.e., what is the difference between the gender gap that exists in the organization and how we interpret the findings here.

    3. Q2

      How does this relate to what we have been talking about with respect to the "narrative" of looking at regression results in Q4c and Q10.

    4. Q10.

      Note how this relates to Q4c above.

  5. May 2023
  6. www.tedmouw.info www.tedmouw.info
    1. e.f

      Note that "race.f" is the name of the variable (this indicates that it is the factor version of the race variable).

      Thus, "race.fBlack" is a 1/0 variable indicating a Black respondent.

    2. gender.f

      gender.f=1 is female, 0 is male.

  7. Apr 2023
  8. www.tedmouw.info www.tedmouw.info
    1. ...add interaction terms between race x college and race x income

  9. Mar 2023
    1. 4 of RHS,

      34

      (Jessica) Can you explain the number subscripts for the Raudenbush & Bryk notation system? I’m having trouble understanding what they mean at the different levels.

      **Response- Sure. Let’s discuss the notation here on 1.3.2 in class.

    2. B_0j

      24

      (Meredith): Also, can we go over what makes the intercept “random” and what’s happening with maximum likelihood estimation?

      Response--having a random intercept allows for more flexibility in modeling the underlying complexity of the real world. Having a random intercept for the World Values Survey allows for countries to differ in their intercepts, which is an important possibility. Otherwise, we are forcing the intercept to be the same across all countries. A simple way to see this would be to estimate separate models for each country--the intercept would be different.

      Maximum likelihood estimation---this is how non-OLS models get "solved", by finding the most likely set of coefficients given the data (we will cover MLE in detail in the lecture on logit/probit models).

      (Natalie) How does the multilevel model work with time as a level? (As in the NLSY79 data)

      Response--time is a level 1 variable. We will be covering growth curve models, where the goal is to model trajectories over time.

    3. ed

      17

      (Jacob) In section 2.1.4 (empirical bayes), you calculate the shrinkage factor and multiply it by the difference between the overall average and the school average to get 5.71. But what does 5.71 mean? That’s supposed to be the empirical bayes prediction right, but how should I interpret 5.71 in this context? What do we do with this number? Pages 14-23 in the Rabe-Hesketh reading was hard for me to follow.

      **Response-Let’s use this as a prompt to go over this calculation in section 2.1.4.

    4. mixed

      26

      (Jacob) So let’s say we have two levels and we do the substitution and we get an equation like this: Yij = [ Y00 + Y10Xij + Y01Wj + Y11WjXij ] + [ U0j + U1Xij + rij ] … How does this work in R? Do we actually combine multiple levels of equations, or does R do that for us? Basically – how do we translate this substitution into R?

      Response-Yes…start with the two equation setup. In the next couple of lectures we are going to go through lots of examples of translating the MLM concepts into what you actually do in R and Stata.

    5. is:

      17

      Shrinkage factor questions:

      (Preethi) Can you explain what a shrinkage factor is?

      **Response-Yes, let’s talk about this in class with the dice roll example. Given what we know (or think we know) about the variance of the individual error term and the level-2 error term, it is a way to provide a Bayesian estimate of the level 2 effect. This actually matches the intuition of what we do naturally in these situations (that’s the goal of the example).

      (Meredith) Is there an ideal value for Rhat (the “shrinkage factor”) that we’d want to reach? I know the closer to 1 the better, but that seems hard to reach. Is Rhat of like 0.7 good?

      Response--if you have a decent sample size within level 2 units, then the reliability will be pretty high. Having a “low” R isn’t bad…it just means that you are hedging your bet a bit about what the effect for that level 2 unit is. I.e., the empirical bayes estimates are the way we make use of level 2 estimates despite the fact that some may have small sample sizes.

      (Anna) Just to clarify if you have a shrinkage (R) of 1 then your estimated between effect would be what you get for Empirical Bayes? And if so, does that mean that the school itself was just in line with your estimates or that your data as a whole is ‘reliable’?

      Response--As noted above, reliable/unreliable in the context of estimating the shrinkage factor doesn’t mean that there is anything wrong with your data (i.e., it isn’t corrupted or damaged), it just refers to the sample size within a level 2 unit (i.e., a school) and how much confidence we place on the estimate taken from that sample. Smaller samples means less confidence, and we hedge a little towards the overall population mean.

    6. io

      17

      (Alissa) If ϵij is the within group error term for individual j in group i, why doesn’t this value change for every data point? For instance, in the lecture video, it seemed like Var(ϵij ) was 15 for all schools. I am confused as to how this value can stay constant if it is meant to be the specific within school error?

      **Response-the variance is constant…it refers to the shape of the distribution---the error term for a particular case is a draw from that distribution. Let’s discuss this in class as well with a die roll.

    7. pa

      12

      Alex) Can you explain Q2.3 in HW 6 from last week: “When we compare nlsy.fe2 to nlsy.fe3 in Table 3 we see that the MWP is now negative for all three child categories (compared to having 0 children). What is going on? To explain this, note that in nlsy.fe2, age and agesq are not in the model, so they are an “omitted variable”. Then, when we add them in nlsy.fe3 we can observe the effect they have on the estimate of the MWP.”

      Response--wages go up with age, and mothers tend to be older than non-mothers, so if we don’t control for age it looks like motherhood has a positive effect on wages.

      Also, in the final FE models, how should we interpret age having a positive coefficient while agesq has a negative coefficient? (NCH, JD)

      Response--age squared has a negative effect because in reality age has a curvilinear effect on wages. Including age and age squared in the model (i.e., in a linear model) allows this curvilinear effect to express itself.

      (Braxton) For some reason I had trouble on HW 6 grouping my models by race. I could create the tables and the FE models but it wouldn’t let me by race. Can you explain the syntax?

      Response--yes..you just need to subset the data by race. The subsetting/filtering part is this: data=nlsy[nlsy$race=="white",] …this would run the model just for whites.

  10. Jan 2023
  11. Apr 2022
  12. www.tedmouw.info www.tedmouw.info
    1. filter

      Note: in this example I kept only cases that were not missing the dependent variable vote.f. I should go back and re-do this incuding these cases. (Q: why?)

  13. Feb 2022
    1. sex.f::occ_fem

      This should be sex.f:occ_fem (i.e., just 1 colon).

  14. www.tedmouw.info www.tedmouw.info
    1. per interview), this

      occ_fem is proportion female in the occupation

  15. Jan 2022
    1. %poverty was 50, what is the effect of %white on p_dem

      i.e., because of the interaction term, the effect of white depends upon % poverty...it is the main effect of white + the effect of (poverty)*(the coefficient on the interaction term)

    2. b_white_combined=b_white+poverty*b_wxpvt

      just focusing on p_dem, %white, %pov, and the interaction term pov*white,

      p_dem = constant +b_w(%white)+b_pov(%pov)+b_int(povwhite).

      p_dem = constant +[b_w+b_int(pov)](%white)+b_pov*(%pov)

      If, for example, %pov=50 then the effect of %white = -.300 + (-.008)*50

    1. and

      use & for and

    2. libraries

      To do list:

      1. Make sure to go over how to upload and download files from Rstudio cloud.
      2. Give an example of the controlling for variables question from last class.
  16. www.tedmouw.info www.tedmouw.info
    1. command

      Note that while the leaflet command looks complicated, you can adapt the code I have used here for your own leaflet maps and there isn't that much that you would have to change.

    2. Copy my Rmarkdown

      Note that the Rmarkdown code is in RStudio Cloud, and you should see it when you log on.

  17. Dec 2021
  18. Local file Local file
    1. 1:10

      test comment...why is there a colon here?

    Annotators