date %in% (holidays_2013$date - 1L) ~ "day before holiday", date %in% (holidays_2013$date + 1L) ~ "day after holiday",
The exercise told you to focus on holidays rather than the days before or after holidays! you're always overthinking!
date %in% (holidays_2013$date - 1L) ~ "day before holiday", date %in% (holidays_2013$date + 1L) ~ "day after holiday",
The exercise told you to focus on holidays rather than the days before or after holidays! you're always overthinking!
I used a list, not a character vector, since the class of an object can have multiple values. For example, the class of the time_hour column is POSIXct, POSIXt.
You think of everything man!
Whether the mean is the best summary depends on what you are using it for :-), i.e. your objective.
gss_cat$tvhours %>% head(5000) %>% shapiro.test()
results: p-value < 0.01, which means the tvhours variable is not normal distribution, therefore, mean is not a good summary metric for this variable, we'd better use median.
The question does not define a way to measure on-time record, so I will consider two metrics: proportion of flights not delayed or cancelled, and mean arrival delay.
This is just making a simple problem complicated.
worst on-time plane is just the plane with the biggest average arrival delay time.