52 Matching Annotations
  1. Sep 2020
    1. minor breaks that use thinner lines to distinguish them

      in the code, you put panel.grid.minor = element_blank(). I'm having a hard time understanding how seemingly putting a NULL element in the minor grid can make it thinner but still exist?

    2. As the question noted, this is because the subcompact car class includes both small cheap cars, and sports cars with large engines.

      In the text, one of the subtitles notes that "2seater" indicates sports cars, and that they are an exception to the negative trend due to their light weight (despite large engines). (I don't think this exercise was meant to refer to subcompact cars.) Are some cars double-listed as 2seater and subcompact ?

    1. ouptut

      output

    2. This contrasts with R markdown files, which show their output inside the console

      My RStudio R Markdown files also display output inside the editor. Not sure if this is due to an update

    3. Alt

      Shift

    1. What does this code do? Why might might it be useful?

      this exercise is outdated, both summarize_each() and funs() are deprecated (told to use across() instead)

    2. #> All list-columns are now preserved.

      Not quite sure what this error message means, but does it have something to do with why the output for both using .drop and omitting it looks the same?

      Confirmed by using all.equal() on the outputs of performing this operation with and without .drop

      Not clear what .drop was supposed to do in the first place?

    1. if we use unnest() instead of unnest(.drop = TRUE)

      my suspicion that this exercise is outdated has now increased, since I get the warning The .drop argument of unnest() is deprecated as of tidyr 1.0.0. All list-columns are now preserved. whenever I try to use .drop, and the output is the same as in the provided answer whether I include it or not

    2. How can you interpret the coefficients of the quadratic? Hint you might want to transform year so that it has mean zero.)

      Could you please explain this part too?

      ie why did we transform year to have mean 0? My graph and model coefficient outputs look the same whether I include the - median(year) or not...

      and is the resulting equation something like Intercept = coef[1]x^2 + coef[2]x?

    1. Try pointrange with mean and standard error of the mean (sd / sqrt(n)).

      this seems to be a typo since this sentence appears twice in the text

    2. regression standard error,

      higher regression standard error

    3. 14%

      I get a number similar to this (but the opposite sign) if I do 2 ^ rsme, but a completely different number if I do the same summary calcuation with resid instead of lresid. Why is it mathematically valid to back transform a manipulated log residual, ie one that has been through sqrt() and similar functions, instead of back transforming the residuals (to either a ratio or actual distance) first?

      Why is this method with lresid better than using resid?

      Also below, what are the units for 23-31? Percent? Dollars?

    4. 40

      Could you please show how you got this number? I tried to follow Exercise 24.2.2: I figured that 0.5 must be equal to r^a1, so I calculated that if a1 = 3.2, then the residual must correspond to 0.81 dollars. Obviously, that doesn't make sense, so I'm wondering what I did wrong.

      Even trying to back transform the residual by exponentiating 2 ^ 0.5 yields 1.4, an also seemingly wrong number.

      When I look at individual observations in the data and plug them into (y1/y0) = (x1/x0)^a1, I get 5.78 for a1. Also, individual observations line up with yours, which is that predictions seem to be ~$40 below the actual price.

      It would be helpful if I knew the coefficients of the linear model, but when I use coef(mod), out comes like 10 coefficients, which seems unhelpful.

      Update: I think I calculated resid wrong--I tried to back transform it using 2 ^ lresid, just like we did with lpred and lcarat, but I couldn't seem to get it... Is it because resid is calculated from a log minus a log? thus lresid actually represents the distance of the data from the model in a way that, bc of log rules, comes out to be resid = price / pred.....

      Using this interpretation, which is my most confident one so far, I agree with the statement that a residual of +2 means the price was 4x lower than expected, but a residual of +/-0.5 would still not come out to be 40. +40% or -30%, maybe, but not +-40.

    5. log

      logb

    6. y

      x

    1. It takes a data frame and a formula and returns a tibble that defines the model equation: each column in the output is associated with one coefficient in the model, the function is always y = a_1 * out1 + a_2 * out_2.

      not sure how to correct this sentence

    1. a[1] = a1 and a[3] = a3, any other values of a[1] and a[3] where a[1] + a[3] == (a1 + a3)

      not sure what this means, is there any way you could provide a visualization? or how could you avoid/address this problem?

    2. you

      delete

    1. 325

      where did you get this number? the mean and median time for the first function divided by the second is closer to 80 than 325...

    2. s

      is "lengths()" a function? could not find it in help

  2. Aug 2020
    1. Confirm

      this question is difficult to understand bc it seems to employ circular logic: obviously flights that leave early are caused by flights that leave early. maybe it means to say that the hypothesis tests whether flights departing in minutes 20-30 and 50-60 are mostly early flights and not delayed ones.

    2. Explain your findings

      is this exercise unfinished? there is no explanation for the results here. I tried filtering out the ones with differences divisible by 60 (%% 60 == 0) to account for discrepancies due to timezone but that didn't seem to help...

    3. We forgot to account for this

      should mention whether this was our mistake in analysis or a data entry mistake that we can fix during analysis. it isn't clear in the text

  3. Jul 2020
    1. character class

      it'd be helpful to explain that character classes are denoted by []

    2. \bsum\b to avoid matching summarise, summ

      can you give an example of what it is instead of what it isn't?

    1. use

      us

    2. running that expression that there are only four airports in this lis

      running that expression shows that? not sure what this is referring to

    3. Then I select the 48 observations

      how come the output is grouped even though you specified ungroup()? does arrange() implicitly discard duplicate results if the var to be arranged on is a result of summarizing? does this have something to do with the "summarize regrouping output" message, and if so why does summarize() have an effect on subsequent functions? I'm not sure this was mentioned elsewhere in the book.

    4. hours

      For this data, we only have data from the year 2013. So why do we always group by year in the exercises? Isn't that redundant/useless?

    5. . S

      , s

    6. flight_weather

      since the text says that inner join is almost never used for analysis, could you also explain when it would be used? for example, is the reason you're using it in all the exercises because it is a convenient way to drop NA observations?

    7. so I truncate

      I truncate

    8. used used

      used

    9. more

      more than one sex?

    10. In a foreign key relationship

      some of the exercises for this subsection a little premature...for some readers (like me) it is difficult to understand exactly what's going on without being first introduced to basic content knowledge in keys.

    1. Second, I should check whether all values for a (country, year) are missing or whether it is possible for only some columns to be missing.

      This part doesn't seem to do what it says it does, although I might just be interpreting everything wrong. But it seems that you are trying to check whether every entry for a particular "country, year" combination would be missing (eg no data collected for that region that year). However, the section is described as checking if multiple columns have missing values (as compared to only one column for that observation having a missing value). At least, perhaps consider clarifying this section.

    2. For example, this will fill in the missing values of the long data frame with 0

      delete

    3. stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) ) stocks %>% pivot_wider(names_from = year, values_from = return, values_fill = 0) #> # A tibble: 4 x 3 #> qtr `2015` `2016` #> <dbl> <dbl> <dbl> #> 1 1 1.88 0 #> 2 2 0.59 0.92 #> 3 3 0.35 0.17 #> 4 4 NA 2.66

      duplicated example I think

    4. complete()

      delete?

    5. set

      delete?

    6. [^ex-12.2.2]

      are all the bracketed parts supposed to be links?