13 Matching Annotations
  1. Last 7 days
  2. Oct 2019
    1. Indicate number of NA values placed in non-numeric columns.

      This is only true when using the Python parsing engine.

      Filled 3 NA values in column name
      

      If using the C parsing engine you get something like the following output:

      Tokenization took: 0.01 ms
      Type conversion took: 0.70 ms
      Parser memory cleanup took: 0.01 ms
      
    1. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

      Pandas usually writes code in this syntax:

      1. JOIN
      2. WHERE
      3. GROUP BY
      4. HAVING

      Example:

      1. df = thing1.join(thing2) # like a JOIN
      2. df = df[df.created_at > 1000] # like a WHERE
      3. df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
      4. df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
      5. df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
      6. df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
      7. df[:30]
  3. Feb 2019
  4. Jan 2019
    1. Measurements are variables that can be quantified. All data in the output above are measurements. Some of these measurements, such as state_percentile_16, avg_score_16 and school_rating, are outcomes; these outcomes cannot be used to explain one another. For example, explaining school_rating as a result of state_percentile_16 (test scores) is circular logic. Therefore we need a second class of variables.
  5. Jun 2018
  6. May 2018
  7. Apr 2018
  8. Mar 2018
    1. I'll skip the inefficient method I used before with the custom groupby aggregationm, and go for some neat trick using the mighty transform method.

      a more constrained. and thus more efficient way to do transformations on groupbys than the apply method. You can do very cool stuff with it. For those of you who know splunk - this has the neat "streamstats" and "eventstats" capabilities

  9. Dec 2017