This seems to be a tad long so I apologise in advance. This is a whole new field for me and I would really like to understand.
Could it be AA and MQ use different values to represent tailnum?
filter( planes, tailnum == 0 )
A tibble: 0 x 9
length(is.na( planes$tailnum ))
[1] 3322
nrow(planes)
[1] 3322
filter( flights, tailnum == 0 )
A tibble: 0 x 19
length(is.na( flights$tailnum ) )
[1] 336776
nrow( flights )
[1] 336776
Yet , the anti_join () as shown in your code shows clearly that there are some talinum values in flights that are not represented in the planes datasets. How could that be?
The one explanation I could come up with is that the two datasets used different talinum values, so I tried to investigate for AA and MQ.
tailnum_flights <- flights %>% filter( carrier == 'AA'| carrier == 'MQ' ) %>% select ( carrier, tailnum )
tailnum_planes <- planes %>% select( tailnum )
tailnum_planes %in% tailnum_flights
[1] FALSE
So, it looks like the tailnum values are not missing for the ten airlines but are represented with values different in the two datasets (flights and planes).
What are your thoughts?
Thank you.