430 Matching Annotations
  1. Feb 2019
    1. Exercise 5.5.4

      Another solution. Now that we know we can "trust" dep_delay we could simply

      md <- flights %>%
      select(carrier,flight,origin,dest,sched_dep_time,dep_time,dep_delay) %>%
      arrange(min_rank(desc(dep_delay)))
      

      There are no ties in the top 10.

    2. Except for flights daylight savings started (March 10) or ended (November 3)

      I understand the point of the paragraph but this sentence is a bit unclear.

    1. sum(cases)

      Must be na.rm = TRUE inside the sum parenthesis for the below graph to appear otherwise it will just be a blank graph.

      By the way thanks so much for these solutions, they are extremely helpful and teach things which are not taught in the book!

  2. Jan 2019
    1. sum_to_one <- function(x, na.rm = FALSE) { x / sum(x, na.rm = na.rm) }

      Since the sum of x is the same across the input, couldn't you make the code less repetitive by assigning it to an intermediate variable?

      sum_to_one <- function(x, na.rm = FALSE) { y = sum(x, na.rm = na.rm) x / y }

    1. There is one remaining issue. Midnight is represented by 2400, which would correspond to 1440 minutes since midnight, but it should correspond to 0. After converting all the times to minutes after midnight, x %% 1440 will convert 1440 to zero while keeping all the other times the same. Now we will put it all together. The following code creates a new data frame flights_times with columns dep_time_mins and sched_dep_time_mins. These columns convert dep_time and sched_dep_time, respectively, to minutes since midnight. flights_times <- mutate(flights, dep_time_mins = (dep_time %/% 100 * 60 + dep_time %% 100) %% 1440, sched_dep_time_mins = (sched_dep_time %/% 100 * 60 + sched_dep_time %% 100) %% 1440

      这个计算变量用的小技巧非常好,要深入体会一下

    2. desc(is.na(dep_time)), dep_time)

      通过两个变量排序,第一个生成一个逻辑变量T,F。因为缺失值是T,所以缺失值就排在了前边,然后再按照第二个变量dep_time排序

  3. Dec 2018
  4. Nov 2018