 Jul 2020

en.wikipedia.org en.wikipedia.org

effects of crossover and dropout
I understand dropout, but how crossover?

Randomized clinical trials analyzed by the intentiontotreat (ITT) approach provide unbiased comparisons among the treatment groups
What is the proof for this? Is there a statistical proof?

 Jan 2020

rdgao.github.io rdgao.github.io

Additionally, one of my all time favorite papers (Fröhlich & McCormick, 2010) showed that an applied external field can entrain spiking in ferret cortical slice, parametrically to the oscillatory field frequency.
What are the sources of LFP? If its only the current induced by EPSP and IPSP, then it is not clear if the entertainment through external field makes a strong argument. If LFPs can be modified by other factors like it is pointed out (Ca++ currents and other glial cells currents), then it is possible that these are not epiphenomenal. But we still need to show that they actually play a causal role in the observed system output.

But spikes do not compute! The cells “compute”, dendrites “compute”, the axon hillock “computes”. In that sense, spikes are epiphenomenal: they are the secondary consequences of dendritic computation, of which you can fully infer by knowing the incoming synaptic inputs and biophysical properties of the neuron.
This is correct. But in that case, LFPs and any other oscillations that we are recording are also epiphenomenal.

 Dec 2019

link.springer.com link.springer.com

estimators is the prior covariance ΣΣφφ
How do we know this covariance?

If the sensor noise has an independent identical distribution (IID) across channels, the covariance of the sensor noise in the referenced data will be Σεεrεεr=σ2TrTTr
I do not understand this.

 Mar 2019

docs.microsoft.com docs.microsoft.com

When the number of references drops to zero, the object deletes itself. The last part is worth repeating: The object deletes itself
How does that work? How does an object delete itself?


docs.microsoft.com docs.microsoft.com

CALLBACK is the calling convention for the functio
What is a calling convention?

 Feb 2019


Calculus on Computational Graphs: Backpropagation
Good article on computational graphs and their role in back propagation.


stats.stackexchange.com stats.stackexchange.com

One benefit of SGD is that it's computationally a whole lot faster. Large datasets often can't be held in RAM, which makes vectorization much less efficient. Rather, each sample or batch of samples must be loaded, worked with, the results stored, and so on. Minibatch SGD, on the other hand, is usually intentionally made small enough to be computationally tractable. Usually, this computational advantage is leveraged by performing many more iterations of SGD, making many more steps than conventional batch gradient descent. This usually results in a model that is very close to that which would be found via batch gradient descent, or better.
Good explanation for why SGD is computationally better. I was confused about the benefits of repeated performing minibatch GD, and why it might be better than batch GD. But I guess the advantage comes from being able to get better performance by vecotrizing computation.


neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

And so it makes most sense to regard epoch 280 as the point beyond which overfitting is dominating learning in our neural network.
I do not get this. Epoch 15 indicates that we are already overfitting to the training data set, on? Assuming both training and test set come from the same population that we are trying to learn from.

If we see that the accuracy on the test data is no longer improving, then we should stop training
This contradicts the earlier statement about epoch 280 being the point where there is overtraining.

It might be that accuracy on the test data and the training data both stop improving at the same time
Can this happen? Can the accuracy on the training data set ever increase with the training epoch?

test data
Shouldn't this be "training data"?

What is the limiting value for the output activations aLj
When c is large, small differences in z_j^L are magnified and the function jumps between 0 and 1, depending on the sign of the differences. On the other hand, when c is very small, all activation values will be close to 1/N; where N is the number of neurons in layer L.

zLj=lnaLj+C
How can the constant C be independent of j? It will have a e^{z_j^L} term in it. This is not correct.
