16 Matching Annotations
  1. May 2025
    1. There’s little overlap in the distributions of cost across privateand public colleges, so we may want to rely upon Model #2

      It is unlikely that you will find a college that has the same exact cost, so relying on Model 2 is more useful, because it doesn't unnecessarily control for Cost.

      "Across all colleges, faculty salaries are expected to be $7800 higher for public colleges than for private colleges"

    2. For colleges of the same type, each $1 increase in cost is

      "After adjusting for the influence of College Type (public or private) a $1 increase in cost is expected to increase avg fac sal by 1.55"

  2. Apr 2025
    1. We could use an implied probability calculator to determine this is an implied probability of 67.11%

      Because: odds = Pr(y=1) / 1-Pr(y=1)

      However, calculation is tedious, so you could just use this sports betting calculator! Overall, though, it is good to know that there is a relationship between odds and probability, and if you know one, you can find the other...

    1. ddsA = exp(β0)

      "Baseline odds" because this is our reference category.

      While exp (b1) is actually just odds ratio. So if being at home makes no difference, then it would be 1. If being at home makes a positive difference, it would be greater than 1. If being at home makes a negative impact, it would be less than 1.

    1. appropriate

      We check assumptions: normality, independence, and variance assumption.

      The amount of variance is . "same normal distributoin everywhere". Looking at spread of residuals

      The QQ-plot deviates too much and the Residuals show almost a quadratic.

      There is a pattern of right-skew in QQ plot.

    2. Model 1: Tip ~ TotBill

      Based on p-value, the drop in sum of squares is not enough to justify adding more complexity. The data doesn't provide any evidence that smoking status adds any valuable information in terms of prediction.

    3. 2.2e-16

      Based on p-value, we should use Model 2 because the result is statistically significant. In other words, adding the additional information from Model 2 yields a statistically significant decrease in Sum of Squared Residuals.

      F value tells us that Sum of Squared Residuals is decreasing a lot, and the p-value is small, meaning that you would not expect this decrease by chance alone

    1. Checking these assumptions can also help us determine if we’reusing an inappropriate model for our data

      Can also tell us if we have model that doesn't fit the data points.

      When we see the residuals versus fitted graph, we notice an under prediction, an overprediction, and then a underprediction. What this is telling us is that our model was insufficient for the type of data we had. We tried to forced it into a model that can only capture a linear relationship, so we have a model that doesn't "fit" very well.

      We could use a polynomial / quadratic regression model to address this-- which might fix the residuals.

    2. Questions: Is model 1 nested within model 3? Is model 2 nestedwithin model 3? Why?7 / 16

      We could specify values for coefficients associated with variables. When beta 1 = 0, beta 2 = 0 and beta 3 = 0, then model 3 IS model 1.

      So model 1 is a special case of model 3. Therefore, model 1 is nested within model 3.

      Models are nested when you can make one model equal to other model, under a certain selection of variables.

    1. Hint: Because there are a small number of cases that claimed zero dollars, you’ll need to add 1 to the outcome variable as was done in the example.

      We need to add 1 otherwise Claim_Amount = 0 entries would mess up analysis