16 Matching Annotations
  1. Jul 2020
    1. effects of crossover and dropout

      I understand dropout, but how crossover?

    2. Randomized clinical trials analyzed by the intention-to-treat (ITT) approach provide unbiased comparisons among the treatment groups

      What is the proof for this? Is there a statistical proof?

  2. Jan 2020
    1. Additionally, one of my all time favorite papers (Fröhlich & McCormick, 2010) showed that an applied external field can entrain spiking in ferret cortical slice, parametrically to the oscillatory field frequency.

      What are the sources of LFP? If its only the current induced by EPSP and IPSP, then it is not clear if the entertainment through external field makes a strong argument. If LFPs can be modified by other factors like it is pointed out (Ca++ currents and other glial cells currents), then it is possible that these are not epiphenomenal. But we still need to show that they actually play a causal role in the observed system output.

    2. But spikes do not compute! The cells “compute”, dendrites “compute”, the axon hillock “computes”. In that sense, spikes are epiphenomenal: they are the secondary consequences of dendritic computation, of which you can fully infer by knowing the incoming synaptic inputs and biophysical properties of the neuron.

      This is correct. But in that case, LFPs and any other oscillations that we are recording are also epiphenomenal.

  3. Dec 2019
    1. estimators is the prior covariance ΣΣφφ

      How do we know this covariance?

    2. If the sensor noise has an independent identical distribution (IID) across channels, the covariance of the sensor noise in the referenced data will be Σεεrεεr=σ2TrTTr

      I do not understand this.

  4. Mar 2019
    1. When the number of references drops to zero, the object deletes itself. The last part is worth repeating: The object deletes itself

      How does that work? How does an object delete itself?

  5. Feb 2019
    1. Calculus on Computational Graphs: Backpropagation

      Good article on computational graphs and their role in back propagation.

    1. One benefit of SGD is that it's computationally a whole lot faster. Large datasets often can't be held in RAM, which makes vectorization much less efficient. Rather, each sample or batch of samples must be loaded, worked with, the results stored, and so on. Minibatch SGD, on the other hand, is usually intentionally made small enough to be computationally tractable. Usually, this computational advantage is leveraged by performing many more iterations of SGD, making many more steps than conventional batch gradient descent. This usually results in a model that is very close to that which would be found via batch gradient descent, or better.

      Good explanation for why SGD is computationally better. I was confused about the benefits of repeated performing mini-batch GD, and why it might be better than batch GD. But I guess the advantage comes from being able to get better performance by vecotrizing computation.

    1. And so it makes most sense to regard epoch 280 as the point beyond which overfitting is dominating learning in our neural network.

      I do not get this. Epoch 15 indicates that we are already over-fitting to the training data set, on? Assuming both training and test set come from the same population that we are trying to learn from.

    2. If we see that the accuracy on the test data is no longer improving, then we should stop training

      This contradicts the earlier statement about epoch 280 being the point where there is over-training.

    3. It might be that accuracy on the test data and the training data both stop improving at the same time

      Can this happen? Can the accuracy on the training data set ever increase with the training epoch?

    4. test data

      Shouldn't this be "training data"?

    5. What is the limiting value for the output activations aLj

      When c is large, small differences in z_j^L are magnified and the function jumps between 0 and 1, depending on the sign of the differences. On the other hand, when c is very small, all activation values will be close to 1/N; where N is the number of neurons in layer L.

    6. zLj=lnaLj+C

      How can the constant C be independent of j? It will have a e^{z_j^L} term in it. This is not correct.