41 Matching Annotations
  1. May 2021
    1. we could a FizzBuzz function

      omission of write?

      Suggested edit: we could write a FizzBuzz function

    1. Finding all plurals cannot be correctly accomplished with regular expressions alone. Finding plural words would at least require morphological information about words in the language. See WordNet for a resource that would do that. However, identifying words that end in an “s” and with more than three characters, in order to remove “as”, “is”, “gas”, etc., is a reasonable heuristic.

      I agree with the statement and used that as a basis for my answer.

      sent_with_words_end_s <- str_subset(sentences, "\b[A-Za-z]{3,}s\b") # Focusing on only those sentences that meets the specified criteria

      words_end_s <- str_extract(sent_with_words_end_s, "\b[A-Za-z]{3,}s\b") # Words ending in s (contains both plural words like "planks" and non plural words like "Sickness"

      str_view(words_end_s, "\b[A-Za-z]+[^s]s$") #Extracts only words that end in s but not in ss.

    1. Since is always good practice to have clear

      Typo: it omitted.

      Edit suggestion: Since (it) is always good practice to have clear

  2. Apr 2021
    1. mutate(cut = if_else(runif(n()) < 0.1, NA_character_, as.character(cut)))

      Can you explain this code to me? I've looked up the if_else function but I do not understand this code.

    1. (arr_delay <= 0))

      Why did you use filter arr_delay <= 0 and not arr_delay > 0 when we are looking for the plane with the worst on-time record? This sounds counterintuitive to me, what am I misunderstanding? Thank you.

    2. this delay will not have those affects plans

      For better clarity, change "this delay will not have those affects plans nor does it affect the total time spent traveling." to this delay will not affect those plans nor would it affect the total time spent traveling.

    1. geom_bar(width = 1)
      1. Please can you explain to me why you included the argument, width = 1 for geom_bar? Without it, the pie doesn't look different. I believe you must have specified for a reason?

      2. This was my attempt to answer the question. I'm not totally if the resulting plot makes much sense. Please take a look and let me know what you think. Thank you.

      ggplot(diamonds, mapping = aes(x = cut, fill = color)) + geom_bar() + coord_polar()

    2. such

      Typo. I think should read, such as, not such.

    3. ..count.. / sum(..count..

      Please, can you explain why you have dots in the code? Thanks.

  3. Sep 2019
    1. we could use map() followed by flatten_dbl(),

      The way this is written could be somewhat confusing to a reader, in my opinion, although the code makes the order of the functions clearer..

      Suggestion:

      If we wanted a numeric vector, we could combine the map() followed with the flatten_dbl(),

    2. like so,

      Edit suggestion.

      like shown:

  4. Aug 2019
    1. at

      Minor mistake: a not at

    2. not

      "Not" repeated. Delete?

    3. Neither NaN nor Inf are not numbers, and so they aren’t even numbers

      Edit suggestion,

      Neither NaN nor Inf is a number

      Or

      Both NaN and Inf are not even numbers.

      If accepted, then "and so they aren’t even numbers" may be deleted as it becomes superfluous.

    4. This is not the same as what you See the value of looking at the value of

      Could you please check the sentence? There seems to be some ambiguity.

    1. is

      redundant. delete?

    2. (!(x %% 3) && !(x %% 5))

      Hi,

      Please can you explain to me how to read this line of code and what it means. I'm having difficulty understanding it. Individually I do understand what each symbol means but put together as it is, I'm unable to. Moreover, it looks more efficient than my effort, as shown below.

      Thank you.

      fizzbuzz <- function(n) {

      x <- n

      if(( x %% 3 == 0 ) && ( x %% 5 == 0 )) {

      print("fizzbuzz")

      } else {

      if(x %% 3 == 0) {

      print("fizz")
      

      } else {

      if(x %% 5 == 0) {

      print("buzz")

      } else { print(x)

      } } }

      }

    1. What does it mean for a flight to have a missing tailnum

      This seems to be a tad long so I apologise in advance. This is a whole new field for me and I would really like to understand.

      Could it be AA and MQ use different values to represent tailnum?

      filter( planes, tailnum == 0 )

      A tibble: 0 x 9

      length(is.na( planes$tailnum ))

      [1] 3322

      nrow(planes)

      [1] 3322

      filter( flights, tailnum == 0 )

      A tibble: 0 x 19

      length(is.na( flights$tailnum ) )

      [1] 336776

      nrow( flights )

      [1] 336776

      Yet , the anti_join () as shown in your code shows clearly that there are some talinum values in flights that are not represented in the planes datasets. How could that be? The one explanation I could come up with is that the two datasets used different talinum values, so I tried to investigate for AA and MQ.

      tailnum_flights <- flights %>% filter( carrier == 'AA'| carrier == 'MQ' ) %>% select ( carrier, tailnum )

      tailnum_planes <- planes %>% select( tailnum )

      tailnum_planes %in% tailnum_flights

      [1] FALSE

      So, it looks like the tailnum values are not missing for the ten airlines but are represented with values different in the two datasets (flights and planes).

      What are your thoughts? Thank you.

  5. Jul 2019
    1. In the full English language, no

      Erm, full stop omitted after no.

      And thanks, for the link. It is an interesting read.

    2. Words that end with “-ed” but not ending in “-eed”

      This worked for me as well:

      str_view(stringr::words, ".*[^e]ed$", match = TRUE)

    3. "ab$^$sfas"

      Please, could explain why you included this in the code? I replicated the answer without it. Thanks.

    1. avg_dest_delays <- flights %>% group_by(dest) %>% # arrival delay NA's are cancelled flights summarise(delay = mean(arr_delay, na.rm = TRUE)) %>% inner_join(airports, by = c(dest = "faa"))

      Please, could you explain to me why if I pipe this directly to ggplot the colour aesthetics is not applied. See code below, it's basically a replication of yours but with ggplot directly piped to avg_dest_delays:

      avg_dest_delays <- flights %>% group_by( dest ) %>% summarise( delay = mean( arr_delay, na.rm = TRUE )) %>% inner_join(airports, by = c( dest = "faa" ) ) %>% ggplot( aes(lon, lat, colour = delay ) ) + borders("state" ) + geom_point( ) + coord_quickmap( )

      Thanks

    2. any any

      repetition?

    3. join

      This also seems to be the case where the by = argument is not used in a code. In that case, it seems, the semi_join() will give outputs only where the rows for both datasets correctly match, for example:

      fueleconomy::vehicles %>% semi_join(fueleconomy::common)

      produces the same output as:

      fueleconomy::vehicles %>% semi_join(fueleconomy::common, by = c("make", "model"))

      Or is that a coincident?

      But, fueleconomy::vehicles %>% semi_join(fueleconomy::common, by = "make"

      will produce a different output as R will match only by "make" in this example.

    4. arr_delay

      The code doesn't affect the output but I thought you might mean, sum( !is.na( dep_delay ))

    5. There are few planes older than 30 years, so I combine them into a single category.

      The code in the solution book totally dropped the data for planes age > 25. How might we combine them into a single row? I didn’t think it could be done without first defining a second tibble that contains all the planes age >25, then merge it with the first tibble that contains the data for planes age <= 25, before carrying out the summarise actions on them after applying the group_by = age argument. older_plane_cohorts <- inner_join( flights, select( planes, tailnum, plane_year = year ), by = "tailnum" ) %>% mutate(age = year - plane_year) %>% filter(!is.na(age)) %>% mutate(age = pmin(46, age) - pmin( 25, age ) ) %>% filter( age != 0 )

      Then I got stuck. I can’t figure out how to proceed after that. And frankly, I can say with any certainty if my argument is sensible. And I’m not even sure it’s possible to combine the 17 rows into a single row. Your help will be appreciated.

      Thank you in advance and for your help so far. Truly appreciated.

    6. mutate(age = pmin(25, age))

      I think you used this line of code to limit the selection to planes not older than 25 years. But in the text above, you stated, "There are few planes older than 30 years, so we combine them into a single category." So, I was expecting the selection to be age <= 30, or using your notation pmin(30, age) and not pmin(25, age). Perhaps an edit of the text may be required unless I'm wrong in my supposition?

    7. This however, this default

      Edit suggestion: However, this default...

    8. If we needed a unique identifier for our analysis, could add a surrogate key.

      Hi, I hope this doesn't come across as nitpicking:

      If we needed a unique identifier for our analysis, we could add a surrogate key, perhaps?

    1. It looks like it is possible for certain variables to missing for (country, years).

      "It looks like it is possible for certain variables to missing for (country, years)." Edit suggestion: It looks like it is possible that certain variables are missing for (country, years).

    2. is

      delete?

    1. ggplot

      Please, can you explain to me what I am getting wrong in the below code and especially the error message, which I have tried goggling to no success.

      I tried to see if instead of count to show, in my practice exercise, the average price per carat group using the code below:

      group_by( diamonds, carat ) %>% summarise( avg_price = mean(price ) ) %>% ggplot( ) + mapping = aes( color = cut_width(carat, 5 ), x = avg_price ) + geom_freqpoly( )

      The code returns the following error:

      Error in group_by(diamonds, carat) %>% summarise(avg_price = mean(price)) %>% : could not find function "+<-"

      I've googled the error without success. So my confusion is in two parts:

      1. What is the right code to show the average price per carat type
      2. What does the above error mean?

      Thanks in advance

    2. visualization

      Does anyone else notice that the R4DS uses British spelling for visualisation, but the Solution textbook uses the American spelling. It's a bit weird when one switches directly from the text book to the solution. I'm sure not many people notice the difference. I probably do because I generally write Brit English. By the way, I am not censuring, just expressing my thought. I am really grateful to the author for making this available.

    3. there spikes in

      Omission of are? Perhaps, you meant there are?

    4. the these

      Typo. Either the or these, preferably these in my opinion.