2 Matching Annotations
  1. Last 7 days
    1. 80% of data analysis is spent on the process of cleaning and preparing the data

      Imagine having unnecessary and wrong data in your document, you would most likely have to experience the concept of time demarcation -- the reluctance in going through every single row and column to eliminate these "garbage data". Clearly, owning all kinds of data without organizing them feels like stuffing your closet with clothes that you should have donated 5 years ago. It is a time-consuming and soul-destroying process for us. Luckily, in R, we have something in R called "tidyverse" package, which I believe the author talks about in the next paragraph, to make life easier for everyone. I personally use dplyr and ggplot2 when I deal with data cleaning, and they are extremely helpful. WIthout these packages' existence, I have no idea when I will be able to reach the final step of data visualization.

  2. Oct 2017
    1. We deleted non-directed tweets, but itshould be acknowledged that non-directed tweets may also bear implications for knowledgesharing and can be examined in future studies. The data cleaning procedure also excludedretweets and tweets that serve as quoting.

      This is a detailed description of their data cleaning methodology. It is good to know to help understand the results of this research study, but is also helpful for me to understand how to clean such large data sets.