Hypothesis

426 Matching Annotations

Jul 2020
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

4
1. electricdinosaurs 07 Jul 2020
  
  in Public
  
  stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) ) stocks %>% pivot_wider(names_from = year, values_from = return, values_fill = 0) #> # A tibble: 4 x 3 #> qtr `2015` `2016` #> <dbl> <dbl> <dbl> #> 1 1 1.88 0 #> 2 2 0.59 0.92 #> 3 3 0.35 0.17 #> 4 4 NA 2.66
  
  duplicated example I think
2. electricdinosaurs 07 Jul 2020
  
  in Public
  
  complete()
  
  delete?
3. electricdinosaurs 07 Jul 2020
  
  in Public
  
  set
  
  delete?
4. electricdinosaurs 05 Jul 2020
  
  in Public
  
  [^ex-12.2.2]
  
  are all the bracketed parts supposed to be links?
Visit annotations in context

Annotators

electricdinosaurs

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
Jun 2020
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. wukong 27 Jun 2020
  
  in Public
  
  flights_delayed3
  
  slice_max is not found
2. Scottmeup 09 Jun 2020
  
  in Public
  
  The fastest flight is the one with the average ground speed,
  
  Should read:
  
  The fastest flight is the one with the fastest average ground speed,
  
  The sentence as currently worded describes the flight with the average ground speed, not the fastest.
Visit annotations in context

Annotators

wukong

Scottmeup

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

1
1. 12379Monty 21 Jun 2020
  
  in Public
  
  numeric_cols <- vector("logical", length(df)) # test whether each column is numeric for (i in seq_along(df)) { numeric_cols[[i]] <- is.numeric(df[[i]]) } # find the indexes of the numeric columns idxs <- which(numeric_cols)
  
  Is there a reason not to use
  
  idxs <- which(sapply(df, is.numeric)) ?
Visit annotations in context

Annotators

12379Monty

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

20 Vectors | R for Data Science: Exercise Solutions

3
1. felipe_lopes 20 Jun 2020
  
  in Public
  
  NA
  
  NA and NaN. Note that 0.286 is equal to 2 out of 7.
2. shgoke 15 Jun 2020
  
  in Public
  
  in
  
  "an" ?
3. shgoke 15 Jun 2020
  
  in Public
  
  integer values
  
  numeric values not limited to integers?
Visit annotations in context

Annotators

felipe_lopes

shgoke

URL

jrnold.github.io/r4ds-exercise-solutions/vectors.html
May 2020
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

6
1. Branicek 31 May 2020
  
  in Public
  
  What other variables are missing?
  
  summary(is.na(flights)) will give NAs of all variables
2. Branicek 31 May 2020
  
  in Public
  
  >
  
  should be >= (at least meaning 30 or more)
3. inkish 24 May 2020
  
  in Public
  
  Exercise 5.3.3
  
  Your solution works, but we have not been taught the mutate command yet. Given the commands we have already been taught, you can get the same results from: arrange(flights, desc(distance / air_time))
4. stewie 07 May 2020
  
  in Public
  
  ground speed
  
  This worked for me arrange(flights, air_time/distance)
5. stewie 07 May 2020
  
  in Public
  
  because the value of the missing TRUE or FALSE, x
  
  I find this hard to follow. Why not phrase it as the next example? NA | TRUE is TRUE because anything or TRUE is always TRUE
6. cearow 02 May 2020
  
  in Public
  
  mean(on_time)
  
  If we were to calculate the proportion of flights not delayed or cancelled, we would need to adjust for NA values of the cancelled flights:
  
  summarise(n = n(), on_time = sum(on_time, na.rm=TRUE) / n)
Visit annotations in context

Annotators

stewie

cearow

Branicek

inkish

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

10 Tibbles | R for Data Science: Exercise Solutions

3
1. HappyMutant 30 May 2020
  
  in Public
  
  Exercise 10.2
  
  Exercise 10.5.2
2. HappyMutant 30 May 2020
  
  in Public
  
  Exercise 10.1
  
  Exercise 10.5.1
3. mibra 02 May 2020
  
  in Public
  
  on
  
  only
Visit annotations in context

Annotators

mibra

HappyMutant

URL

jrnold.github.io/r4ds-exercise-solutions/tibbles.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

2
1. over 12 May 2020
  
  in Public
  
  The words that have seven letters or more are
  
  if we need to view/extract later the words that have at least 7 characters we can:
  
  str_view( stringr::words, "........*", match = T)
  
  adding "*" after the eight "." states that the last "." can be repeated zero or more times.
  
  or
  
  str_view(stringr::words, ".{7,}", match = T)
  
  in both cases we match the words that have at least 7 characters, not only up to the seventh letter
2. mibra 10 May 2020
  
  in Public
  
  Words that contain only consonants
  
  An alternative: str_view(words, "[aeiou]", match = FALSE)
  
  Or, using str_subset: str_subset(words, "[aeiou]", negate = TRUE)
Visit annotations in context

Annotators

mibra

over

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

2
1. mibra 10 May 2020
  
  in Public
  
  foreign
  
  If one adds "%>% distinct(dest)" to the expression, one sees that there are four airports that are not in the FAA list: BQN, SJU, STT, and PSE. Three of these are in Puerto Rico, one (STT) is in US Virgin Islands.
2. mibra 09 May 2020
  
  in Public
  
  precipitation
  
  Perhaps the association with visibility is even stronger.
Visit annotations in context

Annotators

mibra

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

2
1. chova32 08 May 2020
  
  in Public
  
  ggplot(data = diamonds) + geom_pointrange( mapping = aes(x = cut, y = depth), stat = "summary", fun.ymin = min, fun.ymax = max, fun.y = median )
  
  Could not generate the same stat_summary( ) plot with that code, did a little research and stack overflow suggested two solutions: use geom_line( )
  
  ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) + geom_line() + stat_summary(fun.y = "median", geom = "point", size = 3)
  
  or, reduce the amount of data by grouping it
  
  data = diamonds %>% group_by(cut) %>% summarise(min = min(depth), max = max(depth), median = median(depth))
  
  ggplot(data, aes(x = cut, y = median, ymin = min, ymax = max)) + geom_linerange() + geom_pointrange()
  
  Source: https://stackoverflow.com/questions/41850568/r-ggplot2-pointrange-example
2. stewie 06 May 2020
  
  in Public
  
  width controls the amount of vertical displacement, and height controls the amount of horizontal displacement.
  
  I think you flipped these labels. width is horizontal and height is vertical
  
  It would also be helpful to emphasize that unless height and/or weight are explicitly defined as zero, there will be jitter. When I first used the geom_jitter, I did not realize this.
Visit annotations in context

Annotators

stewie

chova32

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Apr 2020
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

7
1. cearow 26 Apr 2020
  
  in Public
  
  combination
  
  is a combination
2. cearow 26 Apr 2020
  
  in Public
  
  contained
  
  contained in
3. cearow 26 Apr 2020
  
  in Public
  
  a particular trip by aircraft from a particular
  
  the sentence is not complete.
4. cearow 26 Apr 2020
  
  in Public
  
  few
  
  flew
5. mibra 26 Apr 2020
  
  in Public
  
  They may be combining different flights?
  
  It seems to refer to diverted flights. The original BTS data also has diverted airport information, including a variable DivArrDelay with the following description: "Difference in minutes between scheduled and actual arrival time for a diverted flight reaching scheduled destination. The ArrDelay column remains NULL for all diverted flights." I browsed this data quickly and, indeed, those missing arr_delay observations have values in the DivArrDelay variable.
6. testlum 19 Apr 2020
  
  in Public
  
  is column
  
  is a column.
7. ClaireLin 18 Apr 2020
  
  in Public
  
  There is one remaining issue. Midnight is represented by 2400, which would correspond to 1440 minutes since midnight, but it should correspond to 0. After converting all the times to minutes after midnight, x %% 1440 will convert 1440 to zero while keeping all the other times the same.
  
  %% 1440 is brilliant
Visit annotations in context

Annotators

cearow

testlum

mibra

ClaireLin

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

2
1. MadMedDz 16 Apr 2020
  
  in Public
  
  5.7
  
  Did you mean 6.5?
2. ashokjswl 09 Apr 2020
  
  in Public
  
  library("viridis")
  
  What is viridis library?
Visit annotations in context

Annotators

ashokjswl

MadMedDz

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

1
1. rohum 13 Apr 2020
  
  in Public
  
  function that directly calculates the number of unique values in a vector.
  
  n_distinct is a faster and more concise equivalent of length(unique(x)).
Visit annotations in context

Annotators

rohum

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

1
1. Eric12 12 Apr 2020
  
  in Public
  
  December
  
  pretty sure december as 31 days
Visit annotations in context

Annotators

Eric12

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. fale 02 Apr 2020
  
  in Public
  
  model
  
  manufacturer missing (it is a categorical variable as well)
Visit annotations in context

Annotators

fale

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Mar 2020
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. Vary 30 Mar 2020
  
  in Public
  
  cyl
  
  How cylinder is a continuous variable?
Visit annotations in context

Annotators

Vary

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. Eric12 27 Mar 2020
  
  in Public
  
  Note
  
  Is anyone getting the above plot? even when I copy paste the code the boxplots are horizontal instead of vertical. tidyverse has the same plot as I do https://ggplot2.tidyverse.org/reference/geom_boxplot.html
  
  If I understand the aes correctly, group should make carat the "categorical" variable and plot vertically as seen here.
  
  If I replace y = price with y = depth it plots vertically as expected.
  
  Can anyone help me understand what's happening?
Visit annotations in context

Annotators

Eric12

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

2
1. anjelicam 15 Mar 2020
  
  in Public
  
  What does the one_of() function do? Why might it be helpful in conjunction with this vector?
  
  Retired Selection Helpers
  
  one_of() is retired in favour of the more precise any_of() and all_of() selectors.
  
  https://www.rdocumentation.org/packages/tidyselect/versions/1.0.0/topics/one_of
2. sjmazer 01 Mar 2020
  
  in Public
  
  will
  
  Change "will the difference" to "is the difference" or "will be the difference"
Visit annotations in context

Annotators

anjelicam

sjmazer

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. dionisius 05 Mar 2020
  
  in Public
  
  Why are gather() and spread() not perfectly symmetrical? Carefully consider the following example:
  
  Shouldn't we update this functions to the newer pivor_longer and pivot_wider? (check ?gather, and also https://r4ds.had.co.nz/tidy-data.html#exercises-24 )
Visit annotations in context

Annotators

dionisius

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
Feb 2020
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. stirlingschneider 04 Feb 2020
  
  in Public
  
  Tidy the simple tibble below. Do you need to spread or gather it? What are the variables?
  
  Thank you for writing/maintaining r4ds! Great resource.
  
  I delivered this example in a lecture introducing tidy data, and the solution to have three variables doesn't make sense to me. Why is this not only two variables (sex and pregnant)? Sure, I could argue that count is a variable on this data, but it doesn't seem as natural.
Visit annotations in context

Annotators

stirlingschneider

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

2
1. jhoeffler 04 Feb 2020
  
  in Public
  
  Exercise 3.7.5
  
  This exercise makes more sense AFTER introducing Position adjustments in part 3.8.
2. jhoeffler 04 Feb 2020
  
  in Public
  
  na.rm:
  
  from the documentation: If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.
Visit annotations in context

Annotators

jhoeffler

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Jan 2020
jrnold.github.io jrnold.github.io

R for Data Science Solutions

4
1. torio 24 Jan 2020
  
  in Public
  
  As that example shows,
  
  Using geom_count() with position ='jitter' may help a bit with overplotting:
  
  ggplot(data = mpg) +
  
  geom_count(mapping = aes(x = cty, y = hwy, color = class), position = "jitter")
2. torio 23 Jan 2020
  
  in Public
  
  stat_violin()
  
  The ggplot2 v3.2.1 R documentation shows geom_violin() paired with stat_ydensity()
3. torio 23 Jan 2020
  
  in Public
  
  The default stat of geom_bar() is stat_bin(). The geom_bar() function only expects an x variable. The stat, stat_bin(), preprocesses input data by counting the number of observations for each value of x. The y aesthetic uses the values of these counts.
  
  The ggplot2 v3.2.1 R documentation states "geom_bar() uses stat_count() by default". Was ggplot2 updated since this answer was published? My understanding is stat_count() is used for discrete x data and stat_bin() for continuous x data.
4. postylem 16 Jan 2020
  
  in Public
  
  The following list contains the categorical variables
  
  missing the variable manufacturer, which is also a categorical (<chr>) variable.
Visit annotations in context

Annotators

torio

postylem

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

1
1. anans 17 Jan 2020
  
  in Public
  
  Write a function that turns (e.g.) a vector c("a", "b", "c") into the string "a, b, and c". Think carefully about what it should do if given a vector of length 0, 1, or 2.
  
  Can we the whole code replace with: ifelse(length(x)==1, x, str_c(str_c(x[-length(x)], collapse = ", ")," and ",str_c(x[length(x)], collapse = ",")))
  
  of course in the form of a function
Visit annotations in context

Annotators

anans

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. IvanM26 12 Jan 2020
  
  in Public
  
  how how
  
  Repeated word
Visit annotations in context

Annotators

IvanM26

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. anans 09 Jan 2020
  
  in Public
  
  Distribution of last digit
  
  Why we do that? Just to see if each digit appears or what? Thanks
Visit annotations in context

Annotators

anans

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
Dec 2019
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

1
1. srpnr 03 Dec 2019
  
  in Public
  
  What is meant by “48 hours over the course of the year”? This could mean two days, a span of 48 contiguous hours, or 48 hours that are not necessarily contiguous hours. I will find 48 not-necessarily contiguous hours. That definition makes better use of the methods introduced in this section and chapter.
  
  Thank you for the book! Could you provide a hint as to how you might go about with this if 48 hours here meant any contiguous 48 hours? My guess is using a windowing function, but I couldn't figure it out.
Visit annotations in context

Annotators

srpnr

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
Nov 2019
jrnold.github.io jrnold.github.io

R for Data Science Solutions

4
1. CL_BZH 23 Nov 2019
  
  in Public
  
  the an
  
  Either it is "the" or "an", but not both.
2. CL_BZH 23 Nov 2019
  
  in Public
  
  with with
  
  "with" written 2 times.
3. arboc 01 Nov 2019
  
  in Public
  
  geom_bin()
  
  this should be "stat_bin()"
4. arboc 01 Nov 2019
  
  in Public
  
  no plots
  
  I think this should be "no points" (i. e. no points on the scatter plot).
Visit annotations in context

Annotators

CL_BZH

arboc

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Oct 2019
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

4
1. egurtzegi 29 Oct 2019
  
  in Public
  
  )
  
  Typo.
2. akuczynski 15 Oct 2019
  
  in Public
  
  A warning is provided since often, but not always,
  
  Sentence syntax unclear. Maybe this:
  
  If a warning is provided often, but not always, there may be a bug in the code.
3. akuczynski 15 Oct 2019
  
  in Public
  
  vectors recycles
  
  vectors, R recycles
  
  typeo
4. wxshlh 12 Oct 2019
  
  in Public
  
  min_rank()
  
  min_rank是对数据大小进行编号排序，遇到重复值，排序相同，但每个值都占一个位置，缺失值不计入
Visit annotations in context

Tags

typeo

Annotators

akuczynski

egurtzegi

wxshlh

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

4
1. akuczynski 14 Oct 2019
  
  in Public
  
  how these parameters affects
  
  affect
  
  typeo
2. akuczynski 14 Oct 2019
  
  in Public
  
  use already
  
  already use
  
  typeo
3. akuczynski 14 Oct 2019
  
  in Public
  
  The benefits encoding
  
  Missing "of" between "benefits" and "encoding"
  
  typeo
4. akuczynski 13 Oct 2019
  
  in Public
  
  hwy vs. cyl
  
  I think common notation is dependent variable (Y) versus independent variable (X), so if you're asking for a y = cyl and x = hwy plot, then I'd say that's cyl vs. hwy (cyl as a function of hwy), not vice versa as written.
Visit annotations in context

Tags

typeo

Annotators

akuczynski

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Sep 2019
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. milorad22 28 Sep 2019
  
  in Public
  
  Unusually fast flights are those flights with the smallest standardized values.
  
  Question Why would it be unusual that flights with air time on the left side of the distribution would be the fastest?
Visit annotations in context

Annotators

milorad22

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. Leiard 18 Sep 2019
  
  in Public
  
  the method used to
  
  Is there not something missing here?
Visit annotations in context

Annotators

Leiard

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

2
1. WindSnowflake 13 Sep 2019
  
  in Public
  
  not tail
  
  Replace by "not ALL tail"?
2. WindSnowflake 13 Sep 2019
  
  in Public
  
  of the both the origin and
  
  Should be "of both the origin and..." :)
Visit annotations in context

Annotators

WindSnowflake

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

1
1. malaradi 05 Sep 2019
  
  in Public
  
  Exercise 14.4.2.2
  
  Exercise 14.4.2.2 Q 2 - this misses words that are at the end of the sentence with a period after them. I use this: str_extract(sentences, "[A-za-z]+ing[ .]")
Visit annotations in context

Annotators

malaradi

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

2
1. Amaks 02 Sep 2019
  
  in Public
  
  we could use map() followed by flatten_dbl(),
  
  The way this is written could be somewhat confusing to a reader, in my opinion, although the code makes the order of the functions clearer..
  
  Suggestion:
  
  If we wanted a numeric vector, we could combine the map() followed with the flatten_dbl(),
2. Amaks 02 Sep 2019
  
  in Public
  
  like so,
  
  Edit suggestion.
  
  like shown:
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
Aug 2019
jrnold.github.io jrnold.github.io

28 Graphics for communication | R for Data Science: Exercise Solutions

1
1. BBang 29 Aug 2019
  
  in Public
  
  x = "Highway MPG Relative to Class Average", y = "Engine Displacement"
  
  Hi. is "Highway MPG Relative to Class Average" the name of Y-axis? and the name of X-axis is "Engine Displacement" because we put in X-axis with displ. thanks:)
Visit annotations in context

Annotators

BBang

URL

jrnold.github.io/r4ds-exercise-solutions/graphics-for-communication.html
jrnold.github.io jrnold.github.io

20 Vectors | R for Data Science: Exercise Solutions

4
1. Amaks 27 Aug 2019
  
  in Public
  
  at
  
  Minor mistake: a not at
2. Amaks 26 Aug 2019
  
  in Public
  
  not
  
  "Not" repeated. Delete?
3. Amaks 26 Aug 2019
  
  in Public
  
  Neither NaN nor Inf are not numbers, and so they aren’t even numbers
  
  Edit suggestion,
  
  Neither NaN nor Inf is a number
  
  Or
  
  Both NaN and Inf are not even numbers.
  
  If accepted, then "and so they aren’t even numbers" may be deleted as it becomes superfluous.
4. Amaks 25 Aug 2019
  
  in Public
  
  This is not the same as what you See the value of looking at the value of
  
  Could you please check the sentence? There seems to be some ambiguity.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/vectors.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. twarczak 26 Aug 2019
  
  in Public
  
  Okay, I’m not sure what’s going on in this data.
  
  Looks like a New York issue.
  
  > filter(flights, !is.na(dep_delay), is.na(arr_delay)) %>% + count(origin) # A tibble: 3 x 2 origin n <chr> <int> 1 EWR 469 2 JFK 337 3 LGA 369
Visit annotations in context

Annotators

twarczak

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

2
1. Amaks 24 Aug 2019
  
  in Public
  
  is
  
  redundant. delete?
2. Amaks 24 Aug 2019
  
  in Public
  
  (!(x %% 3) && !(x %% 5))
  
  Hi,
  
  Please can you explain to me how to read this line of code and what it means. I'm having difficulty understanding it. Individually I do understand what each symbol means but put together as it is, I'm unable to. Moreover, it looks more efficient than my effort, as shown below.
  
  Thank you.
  
  fizzbuzz <- function(n) {
  
  x <- n
  
  if(( x %% 3 == 0 ) && ( x %% 5 == 0 )) {
  
  print("fizzbuzz")
  
  } else {
  
  if(x %% 3 == 0) {
  
  print("fizz")
  
  } else {
  
  if(x %% 5 == 0) {
  
  print("buzz")
  
  } else { print(x)
  
  } } }
  
  }
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. BBang 22 Aug 2019
  
  in Public
  
  function(.data)
  
  Hi Jeffrey. First of all, i really appreciate this Exercise solutions! and.. i need your help! why did you code ".data"? Is there any difference in the case of "data"? when i coded "data", The results were the same as ".data". i had tried to find a difference between them, i couldn`t find it....please let me know. does it have no difference?..
Visit annotations in context

Annotators

BBang

URL

jrnold.github.io/r4ds-exercise-solutions/model-basics.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

3
1. Amaks 15 Aug 2019
  
  in Public
  
  is that
  
  Change word order perhaps? that is
2. Amaks 15 Aug 2019
  
  in Public
  
  which as a
  
  which has
3. Amaks 14 Aug 2019
  
  in Public
  
  if
  
  Minor mistake. Should read if it does not
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

1
1. Amaks 04 Aug 2019
  
  in Public
  
  What does it mean for a flight to have a missing tailnum
  
  This seems to be a tad long so I apologise in advance. This is a whole new field for me and I would really like to understand.
  
  Could it be AA and MQ use different values to represent tailnum?
  
  filter( planes, tailnum == 0 )
  
  A tibble: 0 x 9
  
  length(is.na( planes$tailnum ))
  
  [1] 3322
  
  nrow(planes)
  
  [1] 3322
  
  filter( flights, tailnum == 0 )
  
  A tibble: 0 x 19
  
  length(is.na( flights$tailnum ) )
  
  [1] 336776
  
  nrow( flights )
  
  [1] 336776
  
  Yet , the anti_join () as shown in your code shows clearly that there are some talinum values in flights that are not represented in the planes datasets. How could that be? The one explanation I could come up with is that the two datasets used different talinum values, so I tried to investigate for AA and MQ.
  
  tailnum_flights <- flights %>% filter( carrier == 'AA'| carrier == 'MQ' ) %>% select ( carrier, tailnum )
  
  tailnum_planes <- planes %>% select( tailnum )
  
  tailnum_planes %in% tailnum_flights
  
  [1] FALSE
  
  So, it looks like the tailnum values are not missing for the ten airlines but are represented with values different in the two datasets (flights and planes).
  
  What are your thoughts? Thank you.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
Jul 2019
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. jeffboichuk 30 Jul 2019
  
  in Public
  
  a is missing
Visit annotations in context

Annotators

jeffboichuk

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

14 Strings | R for Data Science Solutions

4
1. Amaks 28 Jul 2019
  
  in Public
  
  In the full English language, no
  
  Erm, full stop omitted after no.
  
  And thanks, for the link. It is an interesting read.
2. Amaks 28 Jul 2019
  
  in Public
  
  Words that end with “-ed” but not ending in “-eed”
  
  This worked for me as well:
  
  str_view(stringr::words, ".*[^e]ed$", match = TRUE)
3. Amaks 28 Jul 2019
  
  in Public
  
  "ab$^$sfas"
  
  Please, could explain why you included this in the code? I replicated the answer without it. Thanks.
4. fbriody 16 Jul 2019
  
  in Public
  
  str_view(words, "([[:letter:]]).*\\1", match = TRUE)
  
  Does this work? E.g. "achieve" does not have a matching PAIR of letters.
Visit annotations in context

Annotators

fbriody

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/strings.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

9
1. Amaks 27 Jul 2019
  
  in Public
  
  avg_dest_delays <- flights %>% group_by(dest) %>% # arrival delay NA's are cancelled flights summarise(delay = mean(arr_delay, na.rm = TRUE)) %>% inner_join(airports, by = c(dest = "faa"))
  
  Please, could you explain to me why if I pipe this directly to ggplot the colour aesthetics is not applied. See code below, it's basically a replication of yours but with ggplot directly piped to avg_dest_delays:
  
  avg_dest_delays <- flights %>% group_by( dest ) %>% summarise( delay = mean( arr_delay, na.rm = TRUE )) %>% inner_join(airports, by = c( dest = "faa" ) ) %>% ggplot( aes(lon, lat, colour = delay ) ) + borders("state" ) + geom_point( ) + coord_quickmap( )
  
  Thanks
2. Amaks 27 Jul 2019
  
  in Public
  
  any any
  
  repetition?
3. Amaks 27 Jul 2019
  
  in Public
  
  join
  
  This also seems to be the case where the by = argument is not used in a code. In that case, it seems, the semi_join() will give outputs only where the rows for both datasets correctly match, for example:
  
  fueleconomy::vehicles %>% semi_join(fueleconomy::common)
  
  produces the same output as:
  
  fueleconomy::vehicles %>% semi_join(fueleconomy::common, by = c("make", "model"))
  
  Or is that a coincident?
  
  But, fueleconomy::vehicles %>% semi_join(fueleconomy::common, by = "make"
  
  will produce a different output as R will match only by "make" in this example.
4. Amaks 27 Jul 2019
  
  in Public
  
  arr_delay
  
  The code doesn't affect the output but I thought you might mean, sum( !is.na( dep_delay ))
5. Amaks 27 Jul 2019
  
  in Public
  
  There are few planes older than 30 years, so I combine them into a single category.
  
  The code in the solution book totally dropped the data for planes age > 25. How might we combine them into a single row? I didn’t think it could be done without first defining a second tibble that contains all the planes age >25, then merge it with the first tibble that contains the data for planes age <= 25, before carrying out the summarise actions on them after applying the group_by = age argument. older_plane_cohorts <- inner_join( flights, select( planes, tailnum, plane_year = year ), by = "tailnum" ) %>% mutate(age = year - plane_year) %>% filter(!is.na(age)) %>% mutate(age = pmin(46, age) - pmin( 25, age ) ) %>% filter( age != 0 )
  
  Then I got stuck. I can’t figure out how to proceed after that. And frankly, I can say with any certainty if my argument is sensible. And I’m not even sure it’s possible to combine the 17 rows into a single row. Your help will be appreciated.
  
  Thank you in advance and for your help so far. Truly appreciated.
6. Amaks 27 Jul 2019
  
  in Public
  
  mutate(age = pmin(25, age))
  
  I think you used this line of code to limit the selection to planes not older than 25 years. But in the text above, you stated, "There are few planes older than 30 years, so we combine them into a single category." So, I was expecting the selection to be age <= 30, or using your notation pmin(30, age) and not pmin(25, age). Perhaps an edit of the text may be required unless I'm wrong in my supposition?
7. Amaks 26 Jul 2019
  
  in Public
  
  This however, this default
  
  Edit suggestion: However, this default...
8. Amaks 26 Jul 2019
  
  in Public
  
  If we needed a unique identifier for our analysis, could add a surrogate key.
  
  Hi, I hope this doesn't come across as nitpicking:
  
  If we needed a unique identifier for our analysis, we could add a surrogate key, perhaps?
9. siddharthabingi 20 Jul 2019
  
  in Public
  
  faa$airports
  
  Shouldn't this be "airports$faa" since airports is a data frame and faa is a variable?
Visit annotations in context

Annotators

siddharthabingi

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

2
1. Amaks 23 Jul 2019
  
  in Public
  
  It looks like it is possible for certain variables to missing for (country, years).
  
  "It looks like it is possible for certain variables to missing for (country, years)." Edit suggestion: It looks like it is possible that certain variables are missing for (country, years).
2. Amaks 22 Jul 2019
  
  in Public
  
  is
  
  delete?
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html
jrnold.github.io jrnold.github.io

10 Tibbles | R for Data Science: Exercise Solutions

2
1. Amaks 20 Jul 2019
  
  in Public
  
  run
  
  run:
2. Amaks 20 Jul 2019
  
  in Public
  
  Using $
  
  Using $,
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/tibbles.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

4
1. Amaks 20 Jul 2019
  
  in Public
  
  ggplot
  
  Please, can you explain to me what I am getting wrong in the below code and especially the error message, which I have tried goggling to no success.
  
  I tried to see if instead of count to show, in my practice exercise, the average price per carat group using the code below:
  
  group_by( diamonds, carat ) %>% summarise( avg_price = mean(price ) ) %>% ggplot( ) + mapping = aes( color = cut_width(carat, 5 ), x = avg_price ) + geom_freqpoly( )
  
  The code returns the following error:
  
  Error in group_by(diamonds, carat) %>% summarise(avg_price = mean(price)) %>% : could not find function "+<-"
  
  I've googled the error without success. So my confusion is in two parts:
  
  What is the right code to show the average price per carat type
  
  What does the above error mean?
  
  Thanks in advance
2. Amaks 20 Jul 2019
  
  in Public
  
  visualization
  
  Does anyone else notice that the R4DS uses British spelling for visualisation, but the Solution textbook uses the American spelling. It's a bit weird when one switches directly from the text book to the solution. I'm sure not many people notice the difference. I probably do because I generally write Brit English. By the way, I am not censuring, just expressing my thought. I am really grateful to the author for making this available.
3. Amaks 18 Jul 2019
  
  in Public
  
  there spikes in
  
  Omission of are? Perhaps, you meant there are?
4. Amaks 18 Jul 2019
  
  in Public
  
  the these
  
  Typo. Either the or these, preferably these in my opinion.
Visit annotations in context

Annotators

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

4
1. Amaks 16 Jul 2019
  
  in Public
  
  n
  
  Can anyone please explain the n in this code? Thanks in advance.
2. harmonica2nd 14 Jul 2019
  
  in Public
  
  sin()
  
  cos( )
3. harmonica2nd 14 Jul 2019
  
  in Public
  
  sin)
  
  sin( )
4. harmonica2nd 13 Jul 2019
  
  in Public
  
  head(arrange(fastest_flights, desc(mph)))
  
  why not just use: arrange(flights, desc(distance / air_time))
Visit annotations in context

Annotators

harmonica2nd

Amaks

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. harmonica2nd 11 Jul 2019
  
  in Public
  
  Year
  
  Class
Visit annotations in context

Annotators

harmonica2nd

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
Jun 2019
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

1
1. n7leadfarmer 24 Jun 2019
  
  in Public
  
  In dep_time, midnight is represented by 2400, not 0.
  
  Recommendation: For new users of R, how one determines the representation of midnight as 2400 should be explained. At this point in the book, a reader would not have been thought how to do the proper exploration to make this determination alone, so it would seem that explaining how the author of the solutions book acquired that knowledge would be most beneficial.
Visit annotations in context

Annotators

n7leadfarmer

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

24 Model building | R for Data Science: Exercise Solutions

2
1. nofacetou 21 Jun 2019
  
  in Public
  
  (date - 1L) %in% holidays_2013$date ~ "day before holiday", (date + 1L) %in% holidays_2013$date ~ "day after holiday",
  
  I think you mixed up these two.
  
  (date + 1L) %in% holidays_2013$date ~ "day before holiday", (date - 1L) %in% holidays_2013$date ~ "day after holiday",
2. nofacetou 20 Jun 2019
  
  in Public
  
  (α+βx)
  
  should be $$\beta \log x$$
Visit annotations in context

Annotators

nofacetou

URL

jrnold.github.io/r4ds-exercise-solutions/model-building.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

1
1. nofacetou 06 Jun 2019
  
  in Public
  
  the more that pre-allocation will outperform appending.
  
  Based on the output, it seems to me that appending outperform the pre-allocation. I run the code myself, got same results. However, another R notebook gave opposite results.
Visit annotations in context

Annotators

nofacetou

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

2
1. cfgauss 04 Jun 2019
  
  in Public
  
  >
  
  Omit.
2. cfgauss 04 Jun 2019
  
  in Public
  
  Exercise 7.4.1
  
  Many thanks for this lovely example. I hadn't understood geom_bar's NA behavior with factor variables until your explanation.
Visit annotations in context

Annotators

cfgauss

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
May 2019
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

8
1. cfgauss 30 May 2019
  
  in Public
  
  Presumably there is a premium for a 1 carat diamond
  
  Is this "buyer's psychology?" The opposite of $4.99 being cheaper than $5.00?
2. cfgauss 30 May 2019
  
  in Public
  
  number of diamonds in each carat range
  
  Nice! Your intuition seems correct. But how do you account for the large number of 1.01 carat diamonds?
3. cfgauss 30 May 2019
  
  in Public
  
  print(n = 30)
  
  print(n = Inf) will print all rows of a tibble.
4. cfgauss 30 May 2019
  
  in Public
  
  seem
  
  see
5. cfgauss 30 May 2019
  
  in Public
  
  There are no diamonds with a price of $1,500
  
  More precisely: there's a $90 gap: the closed interval [1455,1545].
6. cfgauss 30 May 2019
  
  in Public
  
  Explore the distribution
  
  I wonder if this includes identifying outliers for possible data errors. E.g. x = 0 for 8 diamonds and z = 0 for 20. Also z = 31.8 for one. y = 0 for 7 diamonds and y = 31.8 and y = 58.9 for one each. I believe all are data errors.
7. cfgauss 29 May 2019
  
  in Public
  
  mutate(id = row_number())
  
  I believe the plots can be generated without id:
  
  diamonds %>% select(x, y, z) %>% gather(variable, value) %>% ggplot(aes(x = value)) + geom_density() + geom_rug() + facet_grid(vars(variable))
8. Eyayaw 05 May 2019
  
  in Public
  
  will
  
  Is it a deliberate or an error putting "will" and "is" together?
Visit annotations in context

Annotators

cfgauss

Eyayaw

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

5
1. MeatloafMalady 26 May 2019
  
  in Public
  
  bottles <- function(i) { if (i > 2) { bottles <- str_c(i - 1, " bottles") } else if (i == 2) { bottles <- "1 bottle" } else { bottles <- "no more bottles" } bottles }
  
  I think this should read:
  
  bottles <- function(i) { if (i > 1) { bottles <- str_c(i , " bottles") } else if (i == 1) { bottles <- "1 bottle" } else { bottles <- "no more bottles" } bottles }
  
  Otherwise you get no bottles of beer twice in the final output
2. Eyayaw 19 May 2019
  
  in Public
  
  map_lglg
  
  typo
3. Eyayaw 19 May 2019
  
  in Public
  
  is.factor(diamonds$color)
  
  repetition?
4. Eyayaw 19 May 2019
  
  in Public
  
  mean(X[i, ])
  
  I think you forget to change mean(X[i, ]) to mean(X[, i]) in calculating column means
5. Eyayaw 19 May 2019
  
  in Public
  
  df[[i]] <- read_csv(files[[i]])
  
  I think it should be corrected this way: for (fname in ...) { df[[fname]] <- bind_rows()
Visit annotations in context

Annotators

MeatloafMalady

Eyayaw

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

7
1. cfgauss 23 May 2019
  
  in Public
  
  For each plane, count the number of flights before the first delay of greater than 1 hour.
  
  I think this is a lovely use of logicals and cumsum. But I believe that it omits planes whose first flight is delayed by more than an hour. There are 234 of these:
  
  (zero <- flights %>% filter(!is.na(dep_delay)) %>% arrange(tailnum,month,day) %>% group_by(tailnum) %>% mutate(delay_gt1hr = dep_delay > 60) %>% filter(row_number() == 1,delay_gt1hr) %>% select(tailnum) %>% mutate(n = 0) )
  
  Then bind_rows could concatenate these to the tibble produced by the posted solution.
  
  This is a "two-part" solution. Is there a "one-part" solution?
2. cfgauss 22 May 2019
  
  in Public
  
  year
  
  Since the year column contains only 2013, you could arrange chronologically without this argument.
3. cfgauss 20 May 2019
  
  in Public
  
  We will calculate this ranking in two parts
  
  Or, the two parts could be combined:
  
  rank <- flights %>% group_by(dest) %>% mutate(n_carriers = n_distinct(carrier)) %>% filter(n_carriers > 1) %>% group_by(carrier) %>% summarize(n_dest = n_distinct(dest)) %>% arrange(desc(n_dest))
4. cfgauss 20 May 2019
  
  in Public
  
  width = 100
  
  Here width = Inf will print all columns.
5. cfgauss 14 May 2019
  
  in Public
  
  print(width = 120)
  
  I believe that print(width = Inf) will reliably print all selected columns.
6. cfgauss 14 May 2019
  
  in Public
  
  below than the
  
  below the
7. cfgauss 14 May 2019
  
  in Public
  
  observations each
  
  observations of each
Visit annotations in context

Annotators

cfgauss

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

4
1. jlopez11 23 May 2019
  
  in Public
  
  color
  
  colour
  
  change from "color" to "colour" to be consistent within the paragraph
  
  typo
2. jlopez11 18 May 2019
  
  in Public
  
  mpg
  
  drv
  
  typo
3. jlopez11 18 May 2019
  
  in Public
  
  mpg
  
  drv
  
  typo
4. jlopez11 18 May 2019
  
  in Public
  
  mpg
  
  drv
  
  typo
Visit annotations in context

Tags

typo

Annotators

jlopez11

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

15 Factors | R for Data Science: Exercise Solutions

1
1. Eyayaw 16 May 2019
  
  in Public
  
  five
  
  'five' is an error.
Visit annotations in context

Annotators

Eyayaw

URL

jrnold.github.io/r4ds-exercise-solutions/factors.html
jrnold.github.io jrnold.github.io

28 Graphics for communication | R for Data Science: Exercise Solutions

1
1. Eyayaw 16 May 2019
  
  in Public
  
  #> Warning: Computation failed in `stat_binhex()`:
  
  Why are not these two plots displayed properly? Only empty background is shown.
Visit annotations in context

Annotators

Eyayaw

URL

jrnold.github.io/r4ds-exercise-solutions/graphics-for-communication.html
Apr 2019
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

2
1. Karachow 26 Apr 2019
  
  in Public
  
  Surprisingly, it appears that depth (z) is always smaller than length (x) or width (y). Length is less than width in more than half the observations, the opposite of expectations. I don’t know what’s going on. If this was not a widely used da
  
  Diamond Dimensions
  
  If you look at this picture, it seems that depth is always smaller than either width or length. Maybe it is easier to plug it into a ring or other jewelry...
  
  Also you said "1. length is less than width, otherwise the width would be called the length". actually, it is the other way round.
2. Karachow 26 Apr 2019
  
  in Public
  
  geom_histogram(binwidth = 1, center = 0) + geom_bar()
  
  I do not understand, why geom_bar is used in addition to geom_histogram. It greates two layers which are essentially the same, right?
Visit annotations in context

Annotators

Karachow

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

16 Dates and times | R for Data Science: Exercise Solutions

2
1. mc2mc 26 Apr 2019
  
  in Public
  
  # … with 1 more row
  
  You might want to check this last row of the tibble that's not displayed. Saturday actually has the shortest delays of any day of the week. You'll see it right away if you plot it:
  
  flights_dt %>% mutate(wday = wday(dep_time, label = TRUE)) %>% group_by(wday) %>% summarize(ave_dep_delay = mean(dep_delay, na.rm = TRUE)) %>% ggplot(aes(x = wday, y = ave_dep_delay)) + geom_bar(stat = "identity")
  
  flights_dt %>% mutate(wday = wday(dep_time, label = TRUE)) %>% group_by(wday) %>% summarize(ave_arr_delay = mean(arr_delay, na.rm = TRUE)) %>% ggplot(aes(x = wday, y = ave_arr_delay)) + geom_bar(stat = "identity")
2. liuminzhao 02 Apr 2019
  
  in Public
  
  %%
  
  should not be %/% ?
Visit annotations in context

Annotators

mc2mc

liuminzhao

URL

jrnold.github.io/r4ds-exercise-solutions/dates-and-times.html
jrnold.github.io jrnold.github.io

19 Functions | R for Data Science: Exercise Solutions

1
1. cfairfield 25 Apr 2019
  
  in Public
  
  skewness <- function(x, na.rm = FALSE) { n <- length(x) m <- mean(x, na.rm = na.rm) v <- var(x, na.rm = na.rm) (sum(x - m)^3 / (n - 2)) / v^(3 / 2) }
  
  This function always returns 0. This is because sum(x-m) (performed before raised to the power 3) will always be 0. Resolved with additional parenthesis:
  
  skewness <- function(x, na.rm = FALSE) { n <- length(x) m <- mean(x, na.rm = na.rm) v <- var(x, na.rm = na.rm) (sum((x - m)^3) / (n - 2)) / v^(3 / 2) }
  
  I appreciate there are several possible formulas for skewness which may not match this one.
Visit annotations in context

Annotators

cfairfield

URL

jrnold.github.io/r4ds-exercise-solutions/functions.html
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

4
1. cfgauss 15 Apr 2019
  
  in Public
  
  I
  
  Omit.
2. cfgauss 15 Apr 2019
  
  in Public
  
  select(tailnum, on_time, arr_time, arr_delay) %>%
  
  If you wish, this could be eliminated since the only columns that survive are those in the summarise.
3. sfl28 02 Apr 2019
  
  in Public
  
  flights
  
  flights at departure
4. sfl28 02 Apr 2019
  
  in Public
  
  the
  
  omit
Visit annotations in context

Annotators

sfl28

cfgauss

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

10 Tibbles | R for Data Science: Exercise Solutions

1
1. excuthoka 04 Apr 2019
  
  in Public
  
  ?print.tbl_df
  
  ?print.tbl instead of ?print.tbl_df (archived)
Visit annotations in context

Annotators

excuthoka

URL

jrnold.github.io/r4ds-exercise-solutions/tibbles.html
jrnold.github.io jrnold.github.io

21 Iteration | R for Data Science: Exercise Solutions

1
1. liuminzhao 04 Apr 2019
  
  in Public
  
  .
  
  or
  
  mtcars %>% map_dbl(mean)
Visit annotations in context

Annotators

liuminzhao

URL

jrnold.github.io/r4ds-exercise-solutions/iteration.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

1
1. Carlito_Fluito 02 Apr 2019
  
  in Public
  
  you
  
  You would do You would use
Visit annotations in context

Annotators

Carlito_Fluito

URL

jrnold.github.io/r4ds-exercise-solutions/model-basics.html
Mar 2019
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

10
1. excuthoka 28 Mar 2019
  
  in Public
  
  arrange(flights, distance / air_time * 60)
  
  Do we need a descending order to put the fastest flights first? arrange(flights, desc(distance / air_time 60)) See also: flights %>% mutate(speed = distance / air_time 60) %>% select(tailnum, distance, air_time, speed, dep_time) %>% arrange(desc(speed))
2. Karachow 12 Mar 2019
  
  in Public
  
  Look at the number of cancelled flights per day. Is there a pattern?
  
  This part is missing
3. Karachow 12 Mar 2019
  
  in Public
  
  benificial
  
  beneficital
  
  typo
4. cfgauss 12 Mar 2019
  
  in Public
  
  were
  
  omit
5. cfgauss 11 Mar 2019
  
  in Public
  
  ungroup() %>%
  
  I believe you don't need ungroup().
6. cfgauss 10 Mar 2019
  
  in Public
  
  an standard deviation. That
  
  "and standard deviation. The following"
7. Karachow 05 Mar 2019
  
  in Public
  
  sinpi()
  
  Typo. I suspect you mean cospi(), because "sinpi(x)" had just been introduced
8. cfgauss 03 Mar 2019
  
  in Public
  
  Unusually fast flights are those flights with the smallest standardized values.
  
  I believe the problem asks for "fast flights per destination". If so, this code produces that list, including ties. Sadly, it uses slice.
  
  fast <- flights %>% filter(!is.na(air_time)) %>% group_by(dest) %>% mutate(z = (air_time - mean(air_time)) / sd(air_time)) %>% slice(which(z == min(z))) %>% select(origin,dest,month,day,carrier,flight,air_time,z) %>% arrange(z)
9. cfgauss 03 Mar 2019
  
  in Public
  
  ggplot(standardized_flights, aes(x = air_time_standard)) + geom_density()
  
  This is a nice plot, but the image below is not it. It appears to have been cut-and-pasted from a previous plot.
10. cfgauss 03 Mar 2019
  
  in Public
  
  standardized_flights
  
  Somewhat more succinctly:
  
  standardized_flights <- flights %>% filter(!is.na(air_time)) %>% group_by(dest,origin) %>% mutate(air_time_standard = (air_time - mean(air_time)) / sd(air_time)) %>% select(origin,dest,month,day,carrier,flight,air_time,air_time_standard)
Visit annotations in context

Tags

typo

Annotators

Karachow

cfgauss

excuthoka

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

R for Data Science Solutions

2
1. excuthoka 21 Mar 2019
  
  in Public
  
  geom_hist
  
  geom_histogram
2. jeffboichuk 14 Mar 2019
  
  in Public
  
  preprocess input
  
  preprocesses input...
Visit annotations in context

Annotators

jeffboichuk

excuthoka

URL

jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html
jrnold.github.io jrnold.github.io

7 Exploratory Data Analysis | R for Data Science: Exercise Solutions

1
1. jeffboichuk 14 Mar 2019
  
  in Public
  
  scatterplot visualize
  
  "to" is missing
Visit annotations in context

Annotators

jeffboichuk

URL

jrnold.github.io/r4ds-exercise-solutions/exploratory-data-analysis.html
jrnold.github.io jrnold.github.io

24 Model building | R for Data Science: Exercise Solutions

1
1. mschwindi 11 Mar 2019
  
  in Public
  
  mod4
  
  mode3
Visit annotations in context

Annotators

mschwindi

URL

jrnold.github.io/r4ds-exercise-solutions/model-building.html
jrnold.github.io jrnold.github.io

13 Relational data | R for Data Science Solutions

1
1. anniemorgan 09 Mar 2019
  
  in Public
  
  filter(n > 100)
  
  This should be filter(n >= 100): "at least 100 flights" implies that 100 should be included as the least possible value of number of flights.
Visit annotations in context

Annotators

anniemorgan

URL

jrnold.github.io/r4ds-exercise-solutions/relational-data.html
jrnold.github.io jrnold.github.io

27 R Markdown | R for Data Science: Exercise Solutions

1
1. shmuhammad 06 Mar 2019
  
  in Public
  
  Both R markdown files and can be knit R markdown documents can be knit.
  
  This part does not make sense . Needs to be reviewed.
Visit annotations in context

Annotators

shmuhammad

URL

jrnold.github.io/r4ds-exercise-solutions/r-markdown.html
Feb 2019
jrnold.github.io jrnold.github.io

5 Data transformation | R for Data Science: Exercise Solutions

27
1. cfgauss 25 Feb 2019
  
  in Public
  
  in 10 flights
  
  Omit, since you've already filtered.
2. cfgauss 25 Feb 2019
  
  in Public
  
  min_rank() and dense_rank()
  
  English is inherently ambiguous while only math (vectors, subscripts) and algorithms are not. But then you'd lose your reader.
  
  Here's a slight rephrasing where rank refers to a component of, say, min_rank(x) and value a component of x.
  
  For each set of tied values the min_rank() function assigns a rank equal to the number of values less than that tied value plus one. In contrast, the dense_rank() function assigns a rank equal to the number of distinct values less than that tied value plus one.
3. cfgauss 25 Feb 2019
  
  in Public
  
  min_rank() and dense_rank()
  
  English is inherently ambiguous while only math (vectors, subscripts) and algorithms are not. But then you'd lose your reader.
  
  Here's a slight rephrasing where rank refers to a component of, say, min_rank(x) and value a component of x.
  
  For each set of tied values the min_rank() function assigns a rank equal to the number of values less than that tied value plus one. The dense_rank() function assigns a rank equal to the number of distinct values less than that tied value plus one.
4. cfgauss 25 Feb 2019
  
  in Public
  
  ranking handles
  
  the ranking functions handle
5. cfgauss 25 Feb 2019
  
  in Public
  
  function
  
  functions
6. cfgauss 25 Feb 2019
  
  in Public
  
  roughly
  
  Is it "rough" or "exact"?
7. cfgauss 25 Feb 2019
  
  in Public
  
  an element
  
  a vector
8. cfgauss 25 Feb 2019
  
  in Public
  
  The row_number() function is confusingly named. It can create ranks for any column.
  
  I might omit this since you give an explanation of row_number berlow which is accurate. Also, tibble adds its own rownumbers and rownumber(x) appears to be "the rownumber after x is sorted", which I don't find confusing.
9. cfgauss 25 Feb 2019
  
  in Public
  
  missing values
  
  ties
10. cfgauss 24 Feb 2019
  
  in Public
  
  na.rm = TRUE
  
  This probably is intended to be an argument of mean. It's not necessary, however, since neither dep_delay nor dep_delay_lag have any NAs. Also, could you provide an interpretation of delay_diff? For example, what does it mean that JFK's is negative?
11. cfgauss 23 Feb 2019
  
  in Public
  
  year,
  
  Could be eliminated since the dataset is for one specific year.
12. cfgauss 23 Feb 2019
  
  in Public
  
  flight
  
  Another definition of flight could be the quad (carrier,flight,origin,dest) which, say, might fly daily. Here's a solution with this definition.
  
  dest_delay <- flights %>% group_by(dest) %>% filter(arr_delay > 0) %>% summarize(dest_ad = sum(arr_delay,na.rm=T),dest_ct = n()) flight_delay <- flights %>% group_by(carrier,flight,origin,dest) %>% filter(arr_delay > 0) %>% summarize(flight_ad = sum(arr_delay,na.rm=T),flight_ct = n()) prop <- as_tibble(merge(dest_delay,flight_delay,by="dest")) %>% mutate(prop = flight_ad / dest_ad) %>% select(carrier,flight,origin,dest,dest_ct,flight_ct,prop)
13. cfgauss 23 Feb 2019
  
  in Public
  
  Exercise 5.7.4
  
  In general selecting just a few columns (often the mutate variables) makes the result clearer since "irrelevant" cols are eliminated.
14. cfgauss 23 Feb 2019
  
  in Public
  
  !is.na(arr_delay),
  
  This could be omitted but keeping it in redundantly may add clarity.
15. cfgauss 23 Feb 2019
  
  in Public
  
  only
  
  Omit
16. cfgauss 23 Feb 2019
  
  in Public
  
  However, there are many planes that have never flown an on-time flight.
  
  Since this minimum rank group has an on_time of 0.0, I believe that this tibble is all planes which have never flown an on-time flight.
17. cfgauss 23 Feb 2019
  
  in Public
  
  <=
  
  ==
  
  I believe min_rank cannot be 0.
18. cfgauss 23 Feb 2019
  
  in Public
  
  arr_delay > 0
  
  (arr_delay > 0)
  
  Of course precedence "does the right thing" but since R will use coercion to give lgl & dbl a value, putting in the redundant parentheses prevents the not-careful reader (me, for example) from "thinking the wrong thing."
19. cfgauss 23 Feb 2019
  
  in Public
  
  cancelled
  
  In this dataset the presence of arr_delay implies the presence of arr_time so the boolean cancelled could be eliminated. Of course, the reader hasn't verified this relation so keeping it in makes sense. The definition of cancelled as "having an arrival delay" and "not having an arrival time" seems a bit odd.
20. cfgauss 23 Feb 2019
  
  in Public
  
  They operate within each group rather than over the entire data frame
  
  Do all functions on grouped tibbles operate only per group and not the entire tibble? E.g.
  
  flights %>% group_by(month,day) %>% arrange(desc(dep_delay))
  
  Will the arrange function order both by group and by the entire tibble?
21. cfgauss 22 Feb 2019
  
  in Public
  
  There are more sophisticated ways to do this analysis
  
  Perhaps, but this is a lovely example of the power of R.
22. cfgauss 21 Feb 2019
  
  in Public
  
  , which calculates
  
  ". atan(y,x) returns"
  
  My worry is that the reader would assume that atan(x,y) returns that angle.
23. cfgauss 20 Feb 2019
  
  in Public
  
  not_cancelled
  
  Hadley defines this as
  
  not_cancelled <- flights %>% filter(!is.na(dep_delay), !is.na(arr_delay))
  
  But in
  
  flights %>% filter(is.na(dep_time) | is.na(arr_time) | is.na(dep_delay) | is.na(arr_delay)) %>% select(sched_arr_time,arr_time,arr_delay,sched_dep_time,dep_time,dep_delay)
  
  the first 6 flights have an NA only in arr_delay. Aren't these likely not to have been cancelled and simply have a missing data entry? Why isn't
  
  not_cancelled <- flights %>% filter(!is.na(dep_time), !is.na(arr_time))
  
  a "better" definition?
24. cfgauss 19 Feb 2019
  
  in Public
  
  and the passenger arrived at the same time
  
  Omit.
25. cfgauss 19 Feb 2019
  
  in Public
  
  Exercise 5.5.4
  
  Another solution. Now that we know we can "trust" dep_delay we could simply
  
  md <- flights %>% select(carrier,flight,origin,dest,sched_dep_time,dep_time,dep_delay) %>% arrange(min_rank(desc(dep_delay)))
  
  There are no ties in the top 10.
26. cfgauss 19 Feb 2019
  
  in Public
  
  Daylight
  
  If you wish, you could change this to "Eastern Daylight" to agree with its previous reference in that sentence.
27. cfgauss 19 Feb 2019
  
  in Public
  
  Except for flights daylight savings started (March 10) or ended (November 3)
  
  I understand the point of the paragraph but this sentence is a bit unclear.
Visit annotations in context

Annotators

cfgauss

URL

jrnold.github.io/r4ds-exercise-solutions/transform.html
jrnold.github.io jrnold.github.io

12 Tidy Data | R for Data Science: Exercise Solutions

1
1. JohnOLeary 19 Feb 2019
  
  in Public
  
  sum(cases)
  
  Must be na.rm = TRUE inside the sum parenthesis for the below graph to appear otherwise it will just be a blank graph.
  
  By the way thanks so much for these solutions, they are extremely helpful and teach things which are not taught in the book!
Visit annotations in context

Annotators

JohnOLeary

URL

jrnold.github.io/r4ds-exercise-solutions/tidy-data.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL