Hypothesis

6 Matching Annotations

Oct 2020
stats.stackexchange.com stats.stackexchange.com

How to choose the number of hidden layers and nodes in a feedforward neural network?

1
1. pyxelr 24 Oct 2020
  
  in Public
  
  The number of hidden neurons should be between the size of the input layer and the size of the output layer. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. The number of hidden neurons should be less than twice the size of the input layer.
  
  3 rules of thumb while choosing the number of hidden layers and neurons
  
  DataScience MachineLearning NeuralNetworks
Visit annotations in context

Tags

NeuralNetworks

DataScience

MachineLearning

Annotators

pyxelr

URL

stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
Feb 2019
stats.stackexchange.com stats.stackexchange.com

Batch gradient descent versus stochastic gradient descent

1
1. siva82kb 05 Feb 2019
  
  in Public
  
  One benefit of SGD is that it's computationally a whole lot faster. Large datasets often can't be held in RAM, which makes vectorization much less efficient. Rather, each sample or batch of samples must be loaded, worked with, the results stored, and so on. Minibatch SGD, on the other hand, is usually intentionally made small enough to be computationally tractable. Usually, this computational advantage is leveraged by performing many more iterations of SGD, making many more steps than conventional batch gradient descent. This usually results in a model that is very close to that which would be found via batch gradient descent, or better.
  
  Good explanation for why SGD is computationally better. I was confused about the benefits of repeated performing mini-batch GD, and why it might be better than batch GD. But I guess the advantage comes from being able to get better performance by vecotrizing computation.
  
  NeuralNetworks ML
Visit annotations in context

Tags

NeuralNetworks

ML

Annotators

siva82kb

URL

stats.stackexchange.com/questions/49528/batch-gradient-descent-versus-stochastic-gradient-descent
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

Neural Networks and Deep Learning

4
1. siva82kb 05 Feb 2019
  
  in Public
  
  And so it makes most sense to regard epoch 280 as the point beyond which overfitting is dominating learning in our neural network.
  
  I do not get this. Epoch 15 indicates that we are already over-fitting to the training data set, on? Assuming both training and test set come from the same population that we are trying to learn from.
  
  NeuralNetworks mathematics ML questions
2. siva82kb 05 Feb 2019
  
  in Public
  
  If we see that the accuracy on the test data is no longer improving, then we should stop training
  
  This contradicts the earlier statement about epoch 280 being the point where there is over-training.
  
  NeuralNetworks mathematics ML questions
3. siva82kb 05 Feb 2019
  
  in Public
  
  It might be that accuracy on the test data and the training data both stop improving at the same time
  
  Can this happen? Can the accuracy on the training data set ever increase with the training epoch?
  
  NeuralNetworks mathematics ML questions
4. siva82kb 05 Feb 2019
  
  in Public
  
  What is the limiting value for the output activations aLj
  
  When c is large, small differences in z_j^L are magnified and the function jumps between 0 and 1, depending on the sign of the differences. On the other hand, when c is very small, all activation values will be close to 1/N; where N is the number of neurons in layer L.
  
  NeuralNetworks mathematics ML
Visit annotations in context

Tags

NeuralNetworks

ML

questions

mathematics

Annotators

siva82kb

URL

neuralnetworksanddeeplearning.com/chap3.html

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL