Hypothesis

426 Matching Annotations

Jan 2024
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. allisonschiltz 26 Jan 2024
  
  in Public
  
  Midnight is represented by 2400, which would correspond to 1440 minutes since midnight, but it should correspond to 0.
  
  This actually doesn't work, because sometimes 1440 means 0 and sometimes it means 2400, depending on whether the delay crosses over from one day to the next.
Visit annotations in context

Annotators

allisonschiltz

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
Dec 2023
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. ziangang 09 Dec 2023
  
  in Public
  
  hides the legend box
  
  where are the differences between the two pictures？？
Visit annotations in context

Annotators

ziangang

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Oct 2023
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

3
1. Vahidkiaa 28 Oct 2023
  
  in Public
  
  one_of()
  
  I think it should be "any_of()"
2. Vahidkiaa 28 Oct 2023
  
  in Public
  
  the value the value
  
  "the value" has been repeated twice after "of the" sth is missing
3. Vahidkiaa 28 Oct 2023
  
  in Public
  
  th
  
  the an "e" is missing
Visit annotations in context

Annotators

Vahidkiaa

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
May 2023
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

1
1. hazarara 18 May 2023
  
  in Public
  
  +
  
  don't believe this + character serves any purpose. It allows 1 or more spaces to between the matching words of group 1 and 2. But why would there ever be more than 1 space?
Visit annotations in context

Annotators

hazarara

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
Feb 2023
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

1
1. Traumstrasse 13 Feb 2023
  
  in Public
  
  Master
  
  Please note that the table "Masters" in Lahman is recalled in "People".
Visit annotations in context

Annotators

Traumstrasse

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
Jan 2023
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. jo_geographer 20 Jan 2023
  
  in Public
  
  , since
  
  so (makes it clearer)
2. jo_geographer 20 Jan 2023
  
  in Public
  
  Second
  
  Should have a comma after this word
Visit annotations in context

Annotators

jo_geographer

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
Jul 2022
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. zacpol 24 Jul 2022
  
  in Public
  
  #> Warning: 2 parsing failures. #> row col expected actual file #> 1 -- 2 columns 3 columns literal data #> 2 -- 2 columns 3 columns literal data #> # A tibble: 2 x 2 #> a b #> <dbl> <dbl> #> 1 1 2 #> 2 4 5
  
  Behavior differs - the last values in each row are concatenated, so I get:
  
  a b 1 23 4 56
  
  Same general pattern for the other answers.
Visit annotations in context

Annotators

zacpol

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
Jun 2022
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

1
1. cthawley 19 Jun 2022
  
  in Public
  
  The code would have had to check if the departure time is less than the scheduled departure time plus departure delay (in minutes).
  
  This error is addressed and corrected in 16.4.2 - maybe worth noting for readers who want to check out solution.
Visit annotations in context

Annotators

cthawley

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. ffs 10 Jun 2022
  
  in Public
  
  cancelled_prop = mean(cancelled),
  
  how about using proportions or percentages? Even plots would be more revealing with less cluttered points.
  
  cancelled_prop = (cancelled_num / flights_num),
  
  OR even better:
  
  cancelled_prop = (cancelled_num / flights_num) * 100,
  
  ggplot (cancelled_per_day, aes (x = cancelled_prop, y = avg_dep_delay)) + geom_point (color = 'blue') + geom_smooth (se = FALSE)
  
  ggplot (cancelled_per_day, aes (x = cancelled_prop, y = avg_arr_delay)) + geom_point (color = 'red') + geom_smooth (se = FALSE, color = 'red')
Visit annotations in context

Annotators

ffs

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

5
1. cthawley 08 Jun 2022
  
  in Public
  
  "2003-01-01"
  
  I think this should be Feb 1 2003 (2003-02-01), based on: date_custom <- c("Day 01 Mon 02 Year 03", "Day 03 Mon 01 Year 01")
2. cthawley 08 Jun 2022
  
  in Public
  
  read_csv("a,b\n\"1")
  
  behavior is different now: "> read_csv("a,b\n\"1") Rows: 0 Columns: 2 <br /> -- Column specification --------------------------------------------------------------- Delimiter: "," chr (2): a, b
  
  i Use spec() to retrieve the full column specification for this data. i Specify the column types or set show_col_types = FALSE to quiet this message.
  
  A tibble: 0 x 2
  
  ... with 2 variables: a <chr>, b <chr>"
3. cthawley 08 Jun 2022
  
  in Public
  
  that value is dropped.
  
  As above, behavior has changed and final two values now combined to "34".
4. cthawley 08 Jun 2022
  
  in Public
  
  #> Warning: 2 parsing failures.
  
  read_csv behavior has changed, warning is now: "Warning message: One or more parsing issues, see problems() for details"
5. cthawley 08 Jun 2022
  
  in Public
  
  the two functions have the exact same arguments:
  
  This is becase read_csv() and read_tsv() are special cases of the more general read_delim().
Visit annotations in context

Annotators

cthawley

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
jrnold.github.io jrnold.github.io

10 Tibbles | R for Data Science: Exercise Solutions

1
1. cthawley 08 Jun 2022
  
  in Public
  
  The n_extra argument determines the number of extra columns to print information for.
  
  According to ?tibble::print.tbl we should use max_extra_cols now: "max_extra_cols Number of extra columns to print abbreviated information for, if the width is too small for the entire tibble. If NULL, the max_extra_cols option is used. The previously defined n_extra argument is soft-deprecated."
Visit annotations in context

Annotators

cthawley

URL

jrnold.github.io/r4ds-exercise-solutions/tibbles.html
May 2022
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. cthawley 29 May 2022
  
  in Public
  
  (air_time_delay)
  
  This should be air_time_delay_pct not air_time_delay. As it is, the tibble below shows the same results as above so we can't see the data sorted to show highest percent first.
2. cthawley 27 May 2022
  
  in Public
  
  These are a few ways to select columns
  
  Another way to select these columns is by excluding all other columns (either by name as below or column number):
  
  select(flights, -(year:day), -sched_dep_time, -sched_arr_time, -(carrier:time_hour))
  
  This isn't a good coding approach, but as an example it does demonstrate how to exclude columns which was one of the techniques used in this section of the text.
Visit annotations in context

Annotators

cthawley

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
Apr 2022
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. meilingsoh 04 Apr 2022
  
  in Public
  
  Day %d Mon %M Year %y
  
  Day %d Mon %m Year %y
Visit annotations in context

Annotators

meilingsoh

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
Feb 2022
jrnold.github.io jrnold.github.io

15 Factors | R for Data Science: Exercise Solutions

1
1. adamnicholson 01 Feb 2022
  
  in Public
  
  And the hours of TV doesn’t look that surprising to me.
  
  The tvhours variable is the hours watched per day. It's unlikely, that respondents actually watch TV 24 hours a day. Could be a misunderstanding of the question, or they could be answering figuratively (e.g. "oh, I watch TV 24/7").
Visit annotations in context

Annotators

adamnicholson

URL

jrnold.github.io/r4ds-exercise-solutions/factors.html
Jan 2022
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. Nancowill 19 Jan 2022
  
  in Public
  
  y[!(x %% 3)] <- "buzz"
  
  Should be x%%5 here not x%%3
Visit annotations in context

Annotators

Nancowill

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
Dec 2021
jrnold.github.io jrnold.github.io

28 Graphics for communication | R for Data Science: Exercise Solutions

1
1. DenisKazakov 21 Dec 2021
  
  in Public
  
  no relationship or only a small negative relationship
  
  This is not quite true, it is just appearance due to choice of scale. If you calculate correlation coefficient between displ and hwy for each classes, it is smaller than 0.5 only for 2-seaters and minivans. P value for linear models is also larger than 0.05 only for 2-seaters and minivans.
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/graphics-for-communication.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. RunAlongSpot 21 Dec 2021
  
  in Public
  
  ..prop..
  
  Has this been updated to after_stat() instead of ..
Visit annotations in context

Annotators

RunAlongSpot

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

24 Model building | R for Data Science: Exercise Solutions

2
1. DenisKazakov 14 Dec 2021
  
  in Public
  
  better model
  
  We can add a new variable: price per carat. And we will see that it drops just before a "nice" carat weight and then sharply rises. So we can somehow try to include it in the model. Table and depth seem to influence price too.
2. Gandolf 09 Dec 2021
  
  in Public
  
  4 to 2
  
  I think this is a typo. It should be vice-versa "from 2 to 4" based on the code snippet below, shouldn't it?
Visit annotations in context

Annotators

DenisKazakov

Gandolf

URL

jrnold.github.io/r4ds-exercise-solutions/model-building.html
Nov 2021
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

2
1. DenisKazakov 28 Nov 2021
  
  in Public
  
  discrepancies
  
  In all cases of discrepancy, the error is exactly 24 hours. It means that the flight was postponed till next day but dep_time erroneously gives the same day.
2. DenisKazakov 28 Nov 2021
  
  in Public
  
  doesn’t appear to much difference
  
  Another, completely different approach is to calculate distribution parameters for each day (I used quartiles) and plot them against time.
  
  The median very slowly declines over the year. Although the decline is slow linear regression analysis shows statistically significant correlation.
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

2
1. DenisKazakov 25 Nov 2021
  
  in Public
  
  replacements <- c("A" = "a", "B" = "b", "C" = "c", "D" = "d", "E" = "e", "F" = "f", "G" = "g", "H" = "h", "I" = "i", "J" = "j", "K" = "k", "L" = "l", "M" = "m", "N" = "n", "O" = "o", "P" = "p", "Q" = "q", "R" = "r", "S" = "s", "T" = "t", "U" = "u", "V" = "v", "W" = "w", "X" = "x", "Y" = "y", "Z" = "z")
  
  An easy way to do it without typing the entire alphabet:
  
  alphabet <- letters
  
  names(alphabet) <- LETTERS
2. DenisKazakov 25 Nov 2021
  
  in Public
  
  Strictly speaking, this code replaces forward slashes with double backslashes. There seems to be a bug in R that prevents replacing with single slash.
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

3
1. DenisKazakov 21 Nov 2021
  
  in Public
  
  hard to say
  
  I can only add that in most (2/3) of these 48 hours either wind speed OR wind gust OR visibility were worse than the mean wind speed or mean visibility respectively.
2. DenisKazakov 20 Nov 2021
  
  in Public
  
  flights
  
  I would also filter out cancelled flights with filter(!is.na(arr_time)) Flights missing a tail number is filtered out automatically by count.
3. DenisKazakov 19 Nov 2021
  
  in Public
  
  What weather conditions
  
  Funny enough, the condition that makes it most likely to see a delay is normal atmospheric pressure!
  
  weather_delay_dep <- flights %>% select(dep_delay, origin, time_hour) %>% filter(!is.na(dep_delay)) %>% inner_join(weather) %>% filter(!is.na(pressure))
  
  pressure_delay <- weather_delay_dep %>% mutate(pressure = round(pressure)) %>% group_by(pressure) %>% summarise(delay = median(dep_delay))
  
  pressure_delay %>% ggplot(aes(x = pressure, y = delay))+ geom_line() + geom_point()
  
  You can see it even before calculating median delay for each pressure value (I use median rather than mean because delay distributions are skewed):
  
  weather_delay_dep %>% ggplot(aes(x = pressure, y = dep_delay)) + geom_point() + geom_smooth()
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

3
1. DenisKazakov 16 Nov 2021
  
  in Public
  
  year > 1995
  
  What happened in 1995? Global number of cases increased by two orders of magnitude.
2. DenisKazakov 16 Nov 2021
  
  in Public
  
  summarize
  
  Also, some countries have missing age groups in some years.
3. DenisKazakov 16 Nov 2021
  
  in Public
  
  for years prior to the existence of the country
  
  There are other missing years, not related to existence of the country. If your group by country followed by summarise(years = unique(year)) you will see that Albania has data for 1995 and 1997 but no data for 1996, Algeria has data for 1997 and 1999 but not for 1998, etc.
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. DenisKazakov 16 Nov 2021
  
  in Public
  
  First
  
  The dataset description says that it describes round diamonds so two dimensions should be almost identical. A good starting point could be plotting frequency distributions of all three dimensions together:
  
  dimensions <- diamonds %>% pivot_longer(cols = c(x, y, z), names_to = 'dimension', values_to = 'value' )
  
  dim_short <- dimensions %>% filter(value <= 10)
  
  ggplot(data = dim_short, mapping = aes(x = value, colour = dimension)) + geom_freqpoly(binwidth = 0.01)
  
  Lines for x and y are practically identical, while z is shifted to the left. It does not prove that these dimensions are equal for each diamond but is a good indicator.
Visit annotations in context

Annotators

DenisKazakov

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
Oct 2021
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. pooja.desai 27 Oct 2021
  
  in Public
  
  numer
  
  number
Visit annotations in context

Annotators

pooja.desai

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. bkuettel 25 Oct 2021
  
  in Public
  
  will will
  
  duplicate word
Visit annotations in context

Annotators

bkuettel

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
Sep 2021
jrnold.github.io jrnold.github.io

15 Factors | R for Data Science: Exercise Solutions

1
1. learner11 20 Sep 2021
  
  in Public
  
  "No answer", "Other", "Don't know", "Not applicable", "No denomination"
  
  "Refused"
  
  Why isn't it mentioned as well?
Visit annotations in context

Annotators

learner11

URL

jrnold.github.io/r4ds-exercise-solutions/factors.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

7
1. learner11 18 Sep 2021
  
  in Public
  
  that
  
  just that
2. learner11 18 Sep 2021
  
  in Public
  
  th
  
  the
3. learner11 17 Sep 2021
  
  in Public
  
  (123) 456-7890
  
  I don't get this match when passing the respective code to the console.
4. learner11 17 Sep 2021
  
  in Public
  
  (123) 456-7890
  
  I don't get this match when passing the respective code to the console.
5. learner11 17 Sep 2021
  
  in Public
  
  (123) 456-7890
  
  I don't get this match when passing the respective code to the console. Any ideas why? What does the s* do?
6. learner11 17 Sep 2021
  
  in Public
  
  )
  
  no closing curve here.
7. learner11 17 Sep 2021
  
  in Public
  
  at last
  
  at least
Visit annotations in context

Annotators

learner11

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. SLJ83 13 Sep 2021
  
  in Public
  
  tribble
  
  should be tibble
Visit annotations in context

Annotators

SLJ83

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. Bokelu 09 Sep 2021
  
  in Public
  
  the last column is dropped.
  
  outdated explanation. the last two columns were combined.
Visit annotations in context

Annotators

Bokelu

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
Aug 2021
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. test_dummy 27 Aug 2021
  
  in Public
  
  So the most important column is arr_delay, which indicates the amount of delay in arrival.
  
  In the context of defining which flights are cancelled, shouldn't the dep_delay variable be considered more important here? As you go on to say, just because a flight as arr_delay as NA doesn't necessarily mean the flight was cancelled. Whereas every flight that has NA for dep_delay also has NA for dep_arr; these are the flights that have been cancelled.
Visit annotations in context

Annotators

test_dummy

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

2
1. learner11 21 Aug 2021
  
  in Public
  
  The variable cty, city highway miles per gallon, is a continuous variable.
  
  Minor mistake:
  
  The variable cty, city miles per gallon, is a continuous variable.
2. zettlchen 10 Aug 2021
  
  in Public
  
  ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(size = 4, color = "white") + geom_point(aes(colour = drv))
  
  Another solution, that evades plotting the points twice, would be the following:
  
  ggplot(mpg, aes(x = displ, y = hwy, fill = drv)) + geom_point(color = "white", shape = 21, size = 3, stroke = 2)
Visit annotations in context

Annotators

zettlchen

learner11

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. omscortiol 15 Aug 2021
  
  in Public
  
  m <- mean(x, na.rm = TRUE)
  
  It should rather be as follows to properly pass on the parameter na.rm: m <- mean(x, na.rm = na.rm)
Visit annotations in context

Annotators

omscortiol

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
Jul 2021
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. jeho 11 Jul 2021
  
  in Public
  
  year
  
  Shouldn't year in this dataset be categorical?
Visit annotations in context

Annotators

jeho

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Jun 2021
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

2
1. forstem6 28 Jun 2021
  
  in Public
  
  each element of the character vector nchar
  
  I guess the character vector should be string instead of nchar.
2. gabrivera 09 Jun 2021
  
  in Public
  
  [y == -Inf] <- 0 y[y == Inf]
  
  How does this work? How come the evaluation of the conditional y == (-)Inf happens in the index of y?
Visit annotations in context

Annotators

gabrivera

forstem6

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

3
1. gabrivera 16 Jun 2021
  
  in Public
  
  str_replace_all("past/present/future", "/", "\\\\")
  
  Why are there two backslashes replacing a single forward-slash?
2. gabrivera 12 Jun 2021
  
  in Public
  
  The \\1 pattern is called a backreference. It matches whatever the first group matched. This allows the pattern to match a repeating pair of letters without having to specify exactly what pair letters is being repeated.
  
  I wonder if this was written early in the editing of the text. The main text already describes this. If this is here as a reminder, maybe it could be worded differently so that it doesn't seem as if it was written early in the editing of the text and sort of forgotten here?
3. gabrivera 12 Jun 2021
  
  in Public
  
  [A-Za-z][A-Za-z]
  
  Could this notation be explained in the main text so that when we try to solve the problem we have the tool at our disposal?
Visit annotations in context

Annotators

gabrivera

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

13
1. petzi 10 Jun 2021
  
  in Public
  
  ggplot(data = diamonds) + geom_bar(aes(x = cut, y = ..count.. / sum(..count..), fill = color))
  
  from ?after_stat
  
  after_stat() replaces the old approaches of using either stat() or surrounding the variable names with ...
  
  so the (new) solution would be:
  
  ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = after_stat(count / max(count)), fill = color))
2. giorgitsu 10 Jun 2021
  
  in Public
  
  such st
  
  such as
  
  typo
3. giorgitsu 10 Jun 2021
  
  in Public
  
  lists
  
  Either the following table lists or the following tables list.
  
  typo
4. giorgitsu 08 Jun 2021
  
  in Public
  
  plots
  
  the plots
  
  typo
5. giorgitsu 06 Jun 2021
  
  in Public
  
  class
  
  class needs to be with different fonts
  
  font
6. giorgitsu 06 Jun 2021
  
  in Public
  
  values
  
  the values
  
  typo
7. giorgitsu 06 Jun 2021
  
  in Public
  
  color
  
  colors
  
  typo
8. giorgitsu 06 Jun 2021
  
  in Public
  
  only facets
  
  facets only on one variable.
  
  typo
9. giorgitsu 06 Jun 2021
  
  in Public
  
  within
  
  with
10. giorgitsu 06 Jun 2021
  
  in Public
  
  on the x-axis.
  
  in the column dimension.
  
  typo
11. giorgitsu 06 Jun 2021
  
  in Public
  
  of drv on the y-axis.
  
  in the row dimension.
  
  typo
12. giorgitsu 06 Jun 2021
  
  in Public
  
  facet
  
  facets
  
  typo
13. giorgitsu 06 Jun 2021
  
  in Public
  
  combinations
  
  the combinations
Visit annotations in context

Tags

typo

font

Annotators

giorgitsu

petzi

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

2
1. gabrivera 08 Jun 2021
  
  in Public
  
  What weather conditions make it more likely to see a delay?
  
  Is there a way of doing this more systematically through z-scores?
  
  flights2 <- flights %>% left_join(weather) %>% pivot_longer(temp:visib, names_to = "weather_condition", values_to = "weather_value") %>% group_by(weather_condition) %>% mutate(weather_value = (weather_value - mean(weather_value, na.rm = TRUE)/sd(weather_value, na.rm = TRUE)))
  
  flights2 %>% ggplot(aes(weather_value, dep_delay, fill = weather_condition)) + geom_point()
  
  I should warn that, although I tried this, not only does it take long to render, but it also simply didn't color my points. I'm not sure why.
2. gabrivera 06 Jun 2021
  
  in Public
  
  The following diagram shows the relations between the Batting, Pitching, and Fielding tables.
  
  Is there a reason why Batting in particular is in the middle? Or could any of the other two have been placed there instead?
Visit annotations in context

Annotators

gabrivera

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
May 2021
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. Amaks 29 May 2021
  
  in Public
  
  we could a FizzBuzz function
  
  omission of write?
  
  Suggested edit: we could write a FizzBuzz function
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. gabrivera 27 May 2021
  
  in Public
  
  ~key
  
  In the textbook, this is "~names", rather than "~key".
Visit annotations in context

Annotators

gabrivera

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

3
1. gabrivera 25 May 2021
  
  in Public
  
  mean delay of a flight for all values of the previous flight
  
  It's not really clear to me why doing the means here makes sense... I'll come back to this and maybe later it'll click, but so far I've re-read this three times in three different days and it still doesn't click... Maybe there's another way of phrasing why this is being done the way it is?
2. gabrivera 17 May 2021
  
  in Public
  
  We could also use the | operator. However, the | does not scale to many choices. Even with only three choices, it is quite verbose.
  
  Could we use & too? Could that be written here?
3. JuliaMarchetti 11 May 2021
  
  in Public
  
  std()
  
  I think the correct name of the function is sd()
Visit annotations in context

Annotators

gabrivera

JuliaMarchetti

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

10 Tibbles | R for Data Science: Exercise Solutions

1
1. gabrivera 24 May 2021
  
  in Public
  
  es
  
  Could this answer include when I would use it, which is what the question asks. I wasn't sure what to answer because I couldn't think of scenarios where a mutate wouldn't do the job just as well?
Visit annotations in context

Annotators

gabrivera

URL

jrnold.github.io/r4ds-exercise-solutions/tibbles.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

3
1. threni 22 May 2021
  
  in Public
  
  Since the histogram bins have already been calculated, it is unaffected.
  
  What is the antecedent of the word "it"??
2. gabrivera 20 May 2021
  
  in Public
  
  affect
  
  affects?
3. Ali.A.A 11 May 2021
  
  in Public
  
  It’s usually better to use the categorical variable with a larger number of categories or the longer labels on the y axis.
  
  I thought it is better to use X axis for variables of larger number of categories.
Visit annotations in context

Annotators

gabrivera

Ali.A.A

threni

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

1
1. Amaks 17 May 2021
  
  in Public
  
  Finding all plurals cannot be correctly accomplished with regular expressions alone. Finding plural words would at least require morphological information about words in the language. See WordNet for a resource that would do that. However, identifying words that end in an “s” and with more than three characters, in order to remove “as”, “is”, “gas”, etc., is a reasonable heuristic.
  
  I agree with the statement and used that as a basis for my answer.
  
  sent_with_words_end_s <- str_subset(sentences, "\b[A-Za-z]{3,}s\b") # Focusing on only those sentences that meets the specified criteria
  
  words_end_s <- str_extract(sent_with_words_end_s, "\b[A-Za-z]{3,}s\b") # Words ending in s (contains both plural words like "planks" and non plural words like "Sickness"
  
  str_view(words_end_s, "\b[A-Za-z]+[^s]s$") #Extracts only words that end in s but not in ss.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. gabrivera 17 May 2021
  
  in Public
  
  factor(1)
  
  Even though this looks similar to the time in which we did "group = 1", this seems different enough to warrant an explanation? At least I don't really understand how this works. On a hunch, I tried replacing "factor(1)" with "identity", and I got the same graph. How did that work? I tried looking up factor in the documentation, and it clarified what that meant, but I still am not sure how that relates to ggplot.
Visit annotations in context

Annotators

gabrivera

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

1
1. Amaks 05 May 2021
  
  in Public
  
  Since is always good practice to have clear
  
  Typo: it omitted.
  
  Edit suggestion: Since (it) is always good practice to have clear
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
Apr 2021
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. Amaks 29 Apr 2021
  
  in Public
  
  mutate(cut = if_else(runif(n()) < 0.1, NA_character_, as.character(cut)))
  
  Can you explain this code to me? I've looked up the if_else function but I do not understand this code.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. Amaks 27 Apr 2021
  
  in Public
  
  (arr_delay <= 0))
  
  Why did you use filter arr_delay <= 0 and not arr_delay > 0 when we are looking for the plane with the worst on-time record? This sounds counterintuitive to me, what am I misunderstanding? Thank you.
2. Amaks 27 Apr 2021
  
  in Public
  
  this delay will not have those affects plans
  
  For better clarity, change "this delay will not have those affects plans nor does it affect the total time spent traveling." to this delay will not affect those plans nor would it affect the total time spent traveling.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

30 R Markdown workflow | R for Data Science: Exercise Solutions

1
1. mibra 27 Apr 2021
  
  in Public
  
  No exercises
  
  Thanks ever so much for this amazing set of solutions. I learned as much from this as from the R4DS textbook. I just made a donation to the kākāpō recovery (as suggested by the r4ds online version) and would be happy to make a similar donation to any charity of your choice. Just let me know what cause you'd like me to support. Thanks again for your awesome work!
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/r-markdown-workflow.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

3
1. Amaks 22 Apr 2021
  
  in Public
  
  geom_bar(width = 1)
  
  Please can you explain to me why you included the argument, width = 1 for geom_bar? Without it, the pie doesn't look different. I believe you must have specified for a reason?
  
  This was my attempt to answer the question. I'm not totally if the resulting plot makes much sense. Please take a look and let me know what you think. Thank you.
  
  ggplot(diamonds, mapping = aes(x = cut, fill = color)) + geom_bar() + coord_polar()
2. Amaks 22 Apr 2021
  
  in Public
  
  such
  
  Typo. I think should read, such as, not such.
3. Amaks 22 Apr 2021
  
  in Public
  
  ..count.. / sum(..count..
  
  Please, can you explain why you have dots in the code? Thanks.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

28 Graphics for communication | R for Data Science: Exercise Solutions

1
1. mibra 19 Apr 2021
  
  in Public
  
  "Terms of US Presdients", subtitle = "Roosevelth
  
  Nothing important, but a couple of typos here: "Presdients" and "Roosevelth". The latter should be Eisenhower (as in the y-axis).
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/graphics-for-communication.html
jrnold.github.io jrnold.github.io

27 R Markdown | R for Data Science: Exercise Solutions

1
1. mibra 15 Apr 2021
  
  in Public
  
  I use arrange() and slice() to select the largest twenty diamonds
  
  I guess an alternative would be to drop "arrange()" and use the slice_max() function instead? That is, slice_max(carat, n = 20, with_ties = FALSE).
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/r-markdown.html
Feb 2021
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

1
1. mibra 12 Feb 2021
  
  in Public
  
  The lyrics for Ten in the Bed are
  
  We're asked to convert the lyrics to a function that can be generalised to any number of people in the bed. Isn't the following closer to answering the question fully:
  
  library(english)
  
  ten_bed <- function(x) { lyric_1 <- c("There were ") lyric_2 <- c(" in the bed And the little one said, “Roll over! Roll over!” So they all rolled over and one fell out\n") number <- as.character(as.english(x:1))
  
  for (i in number) { if (i != "one") { cat(lyric_1) cat(i) cat(lyric_2) cat("\n") } else { cat("There was one in the bed And the little one said, “Alone at last!” “Good Night!”") } } }
  
  ten_bed(x)
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

1
1. danielBraddock 07 Feb 2021
  
  in Public
  
  Exercise 16.3.7
  
  Confirm my hypothesis that flights that departed in minutes 20-30 and 50-60 are more likely to have departed early than on time or late. Hint: create a binary variable that tells you whether or not a flight departed early.
Visit annotations in context

Annotators

danielBraddock

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
Jan 2021
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

5
1. alidahud 26 Jan 2021
  
  in Public
  
  flights %>% # sort in increasing order select(tailnum, year, month,day, dep_delay) %>% filter(!is.na(dep_delay)) %>% arrange(tailnum, year, month, day) %>% group_by(tailnum) %>% # cumulative number of flights delayed over one hour mutate(cumulative_hr_delays = cumsum(dep_delay > 60)) %>% # count the number of flights == 0 summarise(total_flights = sum(cumulative_hr_delays < 1)) %>% arrange(total_flights)
  
  Here you counted the opposite, meaning the flights AFTER the first delay of greater than 60 minutes the code should've looked something like this:
  
  flights %>% select(tailnum, year, month,day, dep_delay) %>% filter(!is.na(dep_delay)) %>% arrange(tailnum, year, month, day) %>% group_by(tailnum) %>% filter(dep_delay<=60 ) %>% summarise(total_flights = n()) %>% arrange(desc(total_flights))
2. tcroft 03 Jan 2021
  
  in Public
  
  NULL
  
  NA
3. tcroft 03 Jan 2021
  
  in Public
  
  ArrDelay
  
  this should be arr_delay
4. tcroft 03 Jan 2021
  
  in Public
  
  not_cancelled %>% group_by(tailnum) %>% tally()
  
  To match the first expression, this should be group_by(dest)
5. tcroft 03 Jan 2021
  
  in Public
  
  being arriving
  
  Should be just "arriving"
Visit annotations in context

Annotators

alidahud

tcroft

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

1
1. danielBraddock 21 Jan 2021
  
  in Public
  
  Explain your findings.
  
  There doesn't seem to be a solution for this
Visit annotations in context

Annotators

danielBraddock

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

3
1. danielBraddock 03 Jan 2021
  
  in Public
  
  str_split(x, ", +(and +)?")[[1]]
  
  optionally drop + (both): str_split(x, ", (and )?")[[1]]
2. danielBraddock 03 Jan 2021
  
  in Public
  
  what a pattern
  
  what pattern
3. danielBraddock 03 Jan 2021
  
  in Public
  
  colour_match
  
  Maybe the intention here was colour_match2 (with the knock on impact on the str_view_all(...) such that only the 2 relevant strings are shown, not 3)
Visit annotations in context

Annotators

danielBraddock

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
Dec 2020
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

2
1. danielBraddock 28 Dec 2020
  
  in Public
  
  Pitching
  
  Exercise 13.3.3 (final sub-task):
  
  The Pitching table is mislabeled as Fielding in the diagram (though it is clear from the listed PKs)
2. danielBraddock 28 Dec 2020
  
  in Public
  
  Master
  
  Exercise 13.3.3 Questions have you compare 3 data sets, one of which is call "People" while in Answers the same is called "Master" (verified in R: identical(Master, People) == TRUE) Posted 2020-12-28
Visit annotations in context

Annotators

danielBraddock

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

3
1. tcroft 28 Dec 2020
  
  in Public
  
  However, vars is not a column in flights, as is the case, then select will use the value the value of the , and select those columns.
  
  Several problems in this sentence. I think this is missing "if" before "vars"; "the value" is duplicated; "of the ," should probably be something like "of the variable vars".
2. tcroft 28 Dec 2020
  
  in Public
  
  th
  
  Should be "the"
3. gmilano 21 Dec 2020
  
  in Public
  
  the value
  
  typo - duplicate "the value"
Visit annotations in context

Annotators

gmilano

tcroft

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

1
1. ludsfer 18 Dec 2020
  
  in Public
  
  str_subset(words, "([A-Za-z][A-Za-z]).*\\1")
  
  this also seems to work
  
  str_subset(words, "(.)(.).*(\1\2)")
  
  Again, Thank you for your time. Really helpfull solutions (They are double backslash, only one is shown though)
Visit annotations in context

Annotators

ludsfer

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. thwings 14 Dec 2020
  
  in Public
  
  var(1:10) #> [1] 9.17 variance(1:10) #> [1] 9.17
  
  Concordance with var() breaks down if you introduce an NA to the vector. As written, the function will return NA if there an NA is in the vector no matter what na.rm is set to.
  
  The commenter above asked if you need to add na.rm to the sum function. In that case it will return a number but not the same as var because length is still counting NA, which can't be overridden. E.g. If you have x <- c(1:10, NA), variance(x) will return 8.25, but var(x, na.rm = TRUE) will return 9.166667.
  
  To fix concordance with NA values in vectors, I wrote the function as such:
  
  variance <- function(x, na.rm = TRUE) { if (na.rm == TRUE) { x <- x[!is.na(x)] } n <- length(x) m <- mean(x) sq_err <- (x - m)^2 sum(sq_err) / (n - 1) }
  
  Then if you use x <- c(1:10, NA) both variance(x, na.rm = TRUE) and var(x, na.rm = TRUE) return 9.166667 and both variance(x, na.rm = FALSE) and var(x, na.rm = FALSE) return NA.
Visit annotations in context

Annotators

thwings

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

20 Vectors | R for Data Science: Exercise Solutions

1
1. ivanonymous 02 Dec 2020
  
  in Public
  
  different
  
  *difference
Visit annotations in context

Annotators

ivanonymous

URL

jrnold.github.io/r4ds-exercise-solutions/vectors.html
Nov 2020
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

5
1. chaoh 28 Nov 2020
  
  in Public
  
  air_time_delay
  
  air_time_delay_pct
2. chaoh 28 Nov 2020
  
  in Public
  
  flight
  
  two "flight"
3. techmoerror220 26 Nov 2020
  
  in Public
  
  I think this alternative answers Exercise 5.7.8 with fewer lines of code. Note though that the answers don't coincide, i.e., I don't havet any tailnum with zero flights before the first 1-hour delay:
  
  flights %>% filter(!is.na(arr_delay)) %>% arrange(tailnum, arr_delay) %>% group_by(tailnum) %>% filter(cumall(arr_delay < 60)) %>% ## all until this condition is false summarise(n = n())
  
  Excercise 5.7.8
4. chaoh 25 Nov 2020
  
  in Public
  
  at least
  
  less than
5. Gandolf 01 Nov 2020
  
  in Public
  
  5.6.1
  
  I was under impression that we need to know how to code these 3 bullet points. Instead, I see only theoretically-philosophical answers.
Visit annotations in context

Tags

Excercise 5.7.8

Annotators

Gandolf

techmoerror220

chaoh

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

3
1. Winsloww 25 Nov 2020
  
  in Public
  
  scale_colour_viridis()
  
  It might be scale_colour_viridis_b() or scale_colour_viridirs_c().I use the form scale_colour_viridis() then it shows it is not availble.
2. Winsloww 25 Nov 2020
  
  in Public
  
  There seems to be a stronger relationship between visibility and delay. Delays are higher when visibility is less than 2 miles.
  
  When I cut the interval into more than 40 slices, I get that the graphic shows fluctuation in the interval of [0,2].When n = 50, it confused me that the delay down and then up in [1.2,1.4].And it may get the different graphic in the short intervals nearing zero when using different n .
3. luke14 05 Nov 2020
  
  in Public
  
  datamodelr
  
  when installing the package datamodelr Using R studio version 1.3.1093 and R version 4.0.3, I get the message: Warning in install.packages : package ‘datamodelr’ is not available for this version of R
Visit annotations in context

Annotators

luke14

Winsloww

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

2
1. Zuosinan_Chen 19 Nov 2020
  
  in Public
  
  stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) ) stocks %>% pivot_wider(names_from = year, values_from = return, values_fill = 0) #> # A tibble: 4 x 3 #> qtr `2015` `2016` #> <dbl> <dbl> <dbl> #> 1 1 1.88 0 #> 2 2 0.59 0.92 #> 3 3 0.35 0.17 #> 4 4 NA 2.66 stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) ) stocks %>% pivot_wider(names_from = year, values_from = return, values_fill = 0) #> # A tibble: 4 x 3 #> qtr `2015` `2016` #> <dbl> <dbl> <dbl> #> 1 1 1.88 0 #> 2 2 0.59 0.92 #> 3 3 0.35 0.17 #> 4 4 NA 2.66
  
  Are these duplicated?
2. Zuosinan_Chen 17 Nov 2020
  
  in Public
  
  names_ptype
  
  should be names_ptypes?
Visit annotations in context

Annotators

Zuosinan_Chen

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. mibra 18 Nov 2020
  
  in Public
  
  sum(sq_err)
  
  Doesn't one also need to add "na.rm = TRUE" here?
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. luke14 04 Nov 2020
  
  in Public
  
  Mon %M
  
  should be %m, since %m is about the month and %M is about minutes. Clearly it should be months here.
Visit annotations in context

Annotators

luke14

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
Oct 2020
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. Gandolf 31 Oct 2020
  
  in Public
  
  th
  
  a typo, it should be "the" or "this"
2. DaXVs 03 Oct 2020
  
  in Public
  
  dep_delay_diff = dep_delay - dep_time_min + sched_dep_time_min
  
  Should not it be dep_delay_diff = dep_delay - dep_time_min - sched_dep_time_min ?
Visit annotations in context

Annotators

DaXVs

Gandolf

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. Ehab.H 29 Oct 2020
  
  in Public
  
  Exercise 12.2.3
  
  table2 %>% pivot_wider(names_from = type, values_from=count) %>% group_by(year,country) %>% summarise(cases=sum(cases)) %>% ggplot(aes(x=year, y=cases,colour=country))+ geom_line()
Visit annotations in context

Annotators

Ehab.H

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

1
1. hillad 21 Oct 2020
  
  in Public
  
  Why is there months() but no dmonths()
  
  I think this question should be rephrased. There is certainly a dmonths().
  
  dmonths(1)
  
  [1] "2629800s (~4.35 weeks)"
  
  months(1)
  
  [1] "1m 0d 0H 0M 0S"
  
  Maybe dmonths() was added since the initial publication of this section? Whether it is makes sense to use is another question, and you cover that in the description. Perhaps it should ask, "What is the difference in output between months(1) and dmonths(1)? Is it logical to ever use dmonths()?" I'm sure someone will come up with a creative use!
Visit annotations in context

Annotators

hillad

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
Sep 2020
jrnold.github.io jrnold.github.io

28 Graphics for communication | R for Data Science: Exercise Solutions

2
1. electricdinosaurs 19 Sep 2020
  
  in Public
  
  minor breaks that use thinner lines to distinguish them
  
  in the code, you put panel.grid.minor = element_blank(). I'm having a hard time understanding how seemingly putting a NULL element in the minor grid can make it thinner but still exist?
2. electricdinosaurs 17 Sep 2020
  
  in Public
  
  As the question noted, this is because the subcompact car class includes both small cheap cars, and sports cars with large engines.
  
  In the text, one of the subtitles notes that "2seater" indicates sports cars, and that they are an exception to the negative trend due to their light weight (despite large engines). (I don't think this exercise was meant to refer to subcompact cars.) Are some cars double-listed as 2seater and subcompact ?
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/graphics-for-communication.html
jrnold.github.io jrnold.github.io

27 R Markdown | R for Data Science: Exercise Solutions

3
1. electricdinosaurs 15 Sep 2020
  
  in Public
  
  ouptut
  
  output
2. electricdinosaurs 15 Sep 2020
  
  in Public
  
  This contrasts with R markdown files, which show their output inside the console
  
  My RStudio R Markdown files also display output inside the editor. Not sure if this is due to an update
3. electricdinosaurs 15 Sep 2020
  
  in Public
  
  Alt
  
  Shift
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/r-markdown.html
jrnold.github.io jrnold.github.io

25 Many models | R for Data Science: Exercise Solutions

2
1. electricdinosaurs 15 Sep 2020
  
  in Public
  
  if we use unnest() instead of unnest(.drop = TRUE)
  
  my suspicion that this exercise is outdated has now increased, since I get the warning The .drop argument of unnest() is deprecated as of tidyr 1.0.0. All list-columns are now preserved. whenever I try to use .drop, and the output is the same as in the provided answer whether I include it or not
2. electricdinosaurs 15 Sep 2020
  
  in Public
  
  How can you interpret the coefficients of the quadratic? Hint you might want to transform year so that it has mean zero.)
  
  Could you please explain this part too?
  
  ie why did we transform year to have mean 0? My graph and model coefficient outputs look the same whether I include the - median(year) or not...
  
  and is the resulting equation something like Intercept = coef[1]x^2 + coef[2]x?
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/many-models.html
jrnold.github.io jrnold.github.io

24 Model building | R for Data Science: Exercise Solutions

6
1. electricdinosaurs 14 Sep 2020
  
  in Public
  
  Try pointrange with mean and standard error of the mean (sd / sqrt(n)).
  
  this seems to be a typo since this sentence appears twice in the text
2. electricdinosaurs 14 Sep 2020
  
  in Public
  
  regression standard error,
  
  higher regression standard error
3. electricdinosaurs 13 Sep 2020
  
  in Public
  
  14%
  
  I get a number similar to this (but the opposite sign) if I do 2 ^ rsme, but a completely different number if I do the same summary calcuation with resid instead of lresid. Why is it mathematically valid to back transform a manipulated log residual, ie one that has been through sqrt() and similar functions, instead of back transforming the residuals (to either a ratio or actual distance) first?
  
  Why is this method with lresid better than using resid?
  
  Also below, what are the units for 23-31? Percent? Dollars?
4. electricdinosaurs 13 Sep 2020
  
  in Public
  
  40
  
  Could you please show how you got this number? I tried to follow Exercise 24.2.2: I figured that 0.5 must be equal to r^a1, so I calculated that if a1 = 3.2, then the residual must correspond to 0.81 dollars. Obviously, that doesn't make sense, so I'm wondering what I did wrong.
  
  Even trying to back transform the residual by exponentiating 2 ^ 0.5 yields 1.4, an also seemingly wrong number.
  
  When I look at individual observations in the data and plug them into (y1/y0) = (x1/x0)^a1, I get 5.78 for a1. Also, individual observations line up with yours, which is that predictions seem to be ~$40 below the actual price.
  
  It would be helpful if I knew the coefficients of the linear model, but when I use coef(mod), out comes like 10 coefficients, which seems unhelpful.
  
  Update: I think I calculated resid wrong--I tried to back transform it using 2 ^ lresid, just like we did with lpred and lcarat, but I couldn't seem to get it... Is it because resid is calculated from a log minus a log? thus lresid actually represents the distance of the data from the model in a way that, bc of log rules, comes out to be resid = price / pred.....
  
  Using this interpretation, which is my most confident one so far, I agree with the statement that a residual of +2 means the price was 4x lower than expected, but a residual of +/-0.5 would still not come out to be 40. +40% or -30%, maybe, but not +-40.
5. electricdinosaurs 13 Sep 2020
  
  in Public
  
  log
  
  logb
6. electricdinosaurs 13 Sep 2020
  
  in Public
  
  y
  
  x
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/model-building.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

2
1. electricdinosaurs 11 Sep 2020
  
  in Public
  
  a[1] = a1 and a[3] = a3, any other values of a[1] and a[3] where a[1] + a[3] == (a1 + a3)
  
  not sure what this means, is there any way you could provide a visualization? or how could you avoid/address this problem?
2. electricdinosaurs 11 Sep 2020
  
  in Public
  
  you
  
  delete
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/model-basics.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

2
1. mcpinazo 10 Sep 2020
  
  in Public
  
  I would expect length to always be less than width, otherwise the length would be called the width.
  
  Why??
2. MonkeyOfWAR 10 Sep 2020
  
  in Public
  
  ggplot(diamonds, aes(x = carat, y = price)) + geom_hex() + facet_wrap(~cut, ncol = 1)
  
  Needs library("hexbin") to work
Visit annotations in context

Annotators

mcpinazo

MonkeyOfWAR

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

16
1. mcpinazo 10 Sep 2020
  
  in Public
  
  Exercise 5.7.8
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>%<br> summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
  
  For example, "N103US" has 46 flights under 1(one) hour. However, 46 is the total number of flights of "N103US", and none of these flights exceeded 1 hour. So unless I'm wrong, we shouldn't include it (the aforementioned "N103US") as having 46 flights before the first flight that exceeds one hour, because flight number 47 (your next flight) may not exceed an hour either (considering extending the search to the year 2014 and so on)! It may never have delay of more than an hour... and if the scope is restricted to only 2013 the reasoning is the same, we should exclude the 677 flights without any delay more than one hour from the result presented.
2. mcpinazo 08 Sep 2020
  
  in Public
  
  Exercise 5.7.8
  
  test
3. mcpinazo 08 Sep 2020
  
  in Public
  
  Exercise 5.7.8
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
4. mcpinazo 08 Sep 2020
  
  in Public
  
  Exercise 5.7.8
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
5. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
6. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
7. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
8. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
9. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
10. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
11. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
12. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  ... filter(mean(cumulative_hr_delays) == 0) %>%
  
  summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
13. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  ... filter(mean(cumulative_hr_delays) == 0) %>%
  
  summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
14. mcpinazo 08 Sep 2020
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  ... filter(mean(cumulative_hr_delays) == 0) %>%
  
  summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
15. mcpinazo 08 Sep 2020
  
  in Public
  
  Exercise 5.7.8 For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We noticed that there are observations with no record of delay exceeding 1 hour!
  
  ... filter(mean(cumulative_hr_delays) == 0) %>%
  
  summarise(n = n())
  
  Results 677 "tailnum" without any record of delay greater than 1 hour!
16. mcpinazo 08 Sep 2020
  
  in Public
  
  Exercise 5.7.8 For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  We also notice cases where there are no records of delays longer than 1 hour!
  
  ..... filter(mean(cumulative_hr_delays) == 0) %>% summarise(n = n())
  
  results: 677 "tailnum" no record of delay exceeding 1 hour!
Visit annotations in context

Annotators

mcpinazo

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

11 Data import | R for Data Science: Exercise Solutions

1
1. manhuawangca 09 Sep 2020
  
  in Public
  
  %M
  
  should be %m
Visit annotations in context

Annotators

manhuawangca

URL

jrnold.github.io/r4ds-exercise-solutions/data-import.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

2
1. electricdinosaurs 06 Sep 2020
  
  in Public
  
  325
  
  where did you get this number? the mean and median time for the first function divided by the second is closer to 80 than 325...
2. electricdinosaurs 06 Sep 2020
  
  in Public
  
  s
  
  is "lengths()" a function? could not find it in help
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

20 Vectors | R for Data Science: Exercise Solutions

3
1. electricdinosaurs 06 Sep 2020
  
  in Public
  
  later
  
  latter
2. electricdinosaurs 06 Sep 2020
  
  in Public
  
  is:9
  
  seems to be missing
3. electricdinosaurs 04 Sep 2020
  
  in Public
  
  to_even = TRUE
  
  what does this argument do if it's not used in the function?
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/vectors.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. electricdinosaurs 01 Sep 2020
  
  in Public
  
  str_length(title))
  
  do you mean str_length(pad)?
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
Aug 2020
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. gbganalyst 25 Aug 2020
  
  in Public
  
  geom_sf() stat_sf()
  
  This is repeated twice.
Visit annotations in context

Annotators

gbganalyst

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

3
1. electricdinosaurs 24 Aug 2020
  
  in Public
  
  3
  
  5
2. electricdinosaurs 20 Aug 2020
  
  in Public
  
  TRUE
  
  na.rm = na.rm ?
3. inkish 02 Aug 2020
  
  in Public
  
  Exercise 19.4.3
  
  jrnlod, your answer to this question is particularly great, thank you!!
  
  I added
  
  if (x == 0) { print(x) }
  
  as I don't think 0 should be a fizz/buzz Is there a less manual way to do this?
Visit annotations in context

Annotators

inkish

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. aaroncook 20 Aug 2020
  
  in Public
  
  <, ==
  
  should these parentheses contain operations like &, |, !, and "xor" rather than the logical comparisons like < and == ?
Visit annotations in context

Annotators

aaroncook

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. daitou 19 Aug 2020
  
  in Public
  
  vales
  
  values
Visit annotations in context

Annotators

daitou

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

3
1. electricdinosaurs 15 Aug 2020
  
  in Public
  
  Confirm
  
  this question is difficult to understand bc it seems to employ circular logic: obviously flights that leave early are caused by flights that leave early. maybe it means to say that the hypothesis tests whether flights departing in minutes 20-30 and 50-60 are mostly early flights and not delayed ones.
2. electricdinosaurs 12 Aug 2020
  
  in Public
  
  Explain your findings
  
  is this exercise unfinished? there is no explanation for the results here. I tried filtering out the ones with differences divisible by 60 (%% 60 == 0) to account for discrepancies due to timezone but that didn't seem to help...
3. electricdinosaurs 12 Aug 2020
  
  in Public
  
  We forgot to account for this
  
  should mention whether this was our mistake in analysis or a data entry mistake that we can fix during analysis. it isn't clear in the text
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. dahei 10 Aug 2020
  
  in Public
  
  summarise(diamonds, mean(x > y), mean(x > z), mean(y > z))
  
  Can you help me translate the command? I don't quite understand what the result of mean( x > y) represents.
Visit annotations in context

Annotators

dahei

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
Jul 2020
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

5
1. electricdinosaurs 31 Jul 2020
  
  in Public
  
  "[A-ZAa-z]+")
  
  [A-Za-z]+
2. electricdinosaurs 28 Jul 2020
  
  in Public
  
  case insensitive flag
  
  can you show what this is? it isn't listed under ?str_view
3. electricdinosaurs 27 Jul 2020
  
  in Public
  
  by dry
  
  this result is irrelevant to the question
4. inkish 25 Jul 2020
  
  in Public
  
  Exercise 14.5.2 question 2 We have not learned 'unlist' yet, we learn that only in Ch 21
  
  I got a list and ordered it using the following code, but it takes only the first word in each sentence (I guess), and str_extract_all doesn't work at all. Any other ideas?
  
  tibble(word = (str_extract(sentences, boundary("word")))) %>% mutate(word = str_to_lower(word)) %>% count(word, sort = TRUE) %>% head(5)
  
  A tibble: 5 x 2
  
  word n
  
  <chr> <int>
  
  1 the 262
  
  2 a 72
  
  3 he 24
  
  4 we 13
  
  5 it 12
5. inkish 25 Jul 2020
  
  in Public
  
  Exercise 12.4.3.2. We learn str_split only in the next section. My rather inelegant solution was: has_apo <- sentences %>% str_subset("\'") has_ap_sep <- has_apo %>% tibble(sentence = has_apo) %>% extract(sentence, c("before", "apo", "after"), "([A-Za-z]*)(\')([A-Za-z]+)", remove = FALSE)
  
  cf https://en.wiktionary.org/wiki/Category:English_contractions There are some contractions where the apostrophe appears as the first character, so use * not + in the first [A-Za-z] expression
Visit annotations in context

Annotators

inkish

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

12
1. electricdinosaurs 18 Jul 2020
  
  in Public
  
  use
  
  us
2. electricdinosaurs 18 Jul 2020
  
  in Public
  
  running that expression that there are only four airports in this lis
  
  running that expression shows that? not sure what this is referring to
3. inkish 15 Jul 2020
  
  in Public
  
  Instead of using year, month, day, hour, you can join on only 'origin' and 'time_hour'
4. electricdinosaurs 14 Jul 2020
  
  in Public
  
  Then I select the 48 observations
  
  how come the output is grouped even though you specified ungroup()? does arrange() implicitly discard duplicate results if the var to be arranged on is a result of summarizing? does this have something to do with the "summarize regrouping output" message, and if so why does summarize() have an effect on subsequent functions? I'm not sure this was mentioned elsewhere in the book.
5. electricdinosaurs 14 Jul 2020
  
  in Public
  
  hours
  
  For this data, we only have data from the year 2013. So why do we always group by year in the exercises? Isn't that redundant/useless?
6. electricdinosaurs 14 Jul 2020
  
  in Public
  
  . S
  
  , s
7. inkish 13 Jul 2020
  
  in Public
  
  The result shows that this is in fact not a primary key, as n > 1 is very high. Why?
  
  One of the variables is named 'n', so first rename that variable, then check:
  
  babynames %>% rename(no = n) %>% count(year, sex, name, prop) %>% filter(n > 1) %>% nrow()
  
  [1] 0
8. electricdinosaurs 12 Jul 2020
  
  in Public
  
  flight_weather
  
  since the text says that inner join is almost never used for analysis, could you also explain when it would be used? for example, is the reason you're using it in all the exercises because it is a convenient way to drop NA observations?
9. electricdinosaurs 12 Jul 2020
  
  in Public
  
  so I truncate
  
  I truncate
10. electricdinosaurs 08 Jul 2020
  
  in Public
  
  used used
  
  used
11. electricdinosaurs 08 Jul 2020
  
  in Public
  
  more
  
  more than one sex?
12. electricdinosaurs 07 Jul 2020
  
  in Public
  
  In a foreign key relationship
  
  some of the exercises for this subsection a little premature...for some readers (like me) it is difficult to understand exactly what's going on without being first introduced to basic content knowledge in keys.
Visit annotations in context

Annotators

inkish

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

2
1. electricdinosaurs 07 Jul 2020
  
  in Public
  
  Second, I should check whether all values for a (country, year) are missing or whether it is possible for only some columns to be missing.
  
  This part doesn't seem to do what it says it does, although I might just be interpreting everything wrong. But it seems that you are trying to check whether every entry for a particular "country, year" combination would be missing (eg no data collected for that region that year). However, the section is described as checking if multiple columns have missing values (as compared to only one column for that observation having a missing value). At least, perhaps consider clarifying this section.
2. electricdinosaurs 07 Jul 2020
  
  in Public
  
  For example, this will fill in the missing values of the long data frame with 0
  
  delete
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

A tibble: 0 x 2

... with 2 variables: a <chr>, b <chr>"

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators