21 Matching Annotations
  1. Jun 2024
  2. Oct 2023
  3. Apr 2023
  4. Dec 2021
  5. Nov 2021
    1. If you don't have that information, you can determine which frequencies are important by extracting features with Fast Fourier Transform. To check the assumptions, here is the tf.signal.rfft of the temperature over time. Note the obvious peaks at frequencies near 1/year and 1/day:

      Do a fft with tensorflow

      fft = tf.signal.rfft(df['T (degC)'])
      f_per_dataset = np.arange(0, len(fft))
      
      n_samples_h = len(df['T (degC)'])
      hours_per_year = 24*365.2524
      years_per_dataset = n_samples_h/(hours_per_year)
      
      f_per_year = f_per_dataset/years_per_dataset
      plt.step(f_per_year, np.abs(fft))
      plt.xscale('log')
      plt.ylim(0, 400000)
      plt.xlim([0.1, max(plt.xlim())])
      plt.xticks([1, 365.2524], labels=['1/Year', '1/day'])
      _ = plt.xlabel('Frequency (log scale)')
      
    2. Now, peek at the distribution of the features. Some features do have long tails, but there are no obvious errors like the -9999 wind velocity value.

      indeed, peek. we are looking at test data too.

      df_std = (df - train_mean) / train_std
      df_std = df_std.melt(var_name='Column', value_name='Normalized')
      plt.figure(figsize=(12, 6))
      ax = sns.violinplot(x='Column', y='Normalized', data=df_std)
      _ = ax.set_xticklabels(df.keys(), rotation=90)
      
    3. It is important to scale features before training a neural network. Normalization is a common way of doing this scaling: subtract the mean and divide by the standard deviation of each feature. The mean and standard deviation should only be computed using the training data so that the models have no access to the values in the validation and test sets. It's also arguable that the model shouldn't have access to future values in the training set when training, and that this normalization should be done using moving averages.

      moving average to avoid data leak

    4. Similarly, the Date Time column is very useful, but not in this string form. Start by converting it to seconds:
      timestamp_s = date_time.map(pd.Timestamp.timestamp)
      

      and then create "Time of day" and "Time of year" signals:

      day = 24*60*60
      year = (365.2425)*day
      
      df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
      df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
      df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
      df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
      
  6. Jul 2021
  7. May 2021
  8. Oct 2020
  9. Jul 2020
  10. Nov 2019
  11. Jun 2019
    1. I ended up turning Documents and & Desktop sync off. I got frustrated with it because my data was constantly being uploaded and downloaded, wasting my bandwidth. But recently I found a tool on Github called iCloud Control. It adds a menu button to Finder that lets you remove local items, download items, and publish a public link to share your files.
  12. Jun 2016
  13. Feb 2016