- Jun 2016
-
rasbt.github.io rasbt.github.io
-
devation
deviation
-
lenghts
lengths
-
exlusion
exclusion
-
reapeated
repeated
-
reapeated
repeated
-
perfomance
performance
-
feeature
Typo
-
- Mar 2015
-
cs231n.github.io cs231n.github.io
-
to beat random search in a carefully-chosen intervals.
to beat random search in a carefully-chosen intervals.
remove the "a"?
-
only training for 1 epoch or even less
only training for 1 epoch or even less .. so we check only in several layers of the network or maybe for all of them during the first epoch?
-
That is, we are generating a random random with a uniform distribution, but then raising it to the power of 10.
That is, we are generating a random random with a uniform distribution, but then raising it to the power of 10.
The word random is repeated?
-
Tue to the denominator term in the RMSprop update
True to the denominator term in the RMSprop update
-
the step decay dropout is slightly
remove the word dropout?
-
theoretical converge guarantees
theoretical convergence guarantees
-
update has recently
update that has recently
-
set of parameters
set of weights per network layer?
-
model capacity
Needs some more explaining? Reference to bias- variance trade-off? Link to VC dimension?
-
validation/training accuracy
I have usually encountered the use of error instead of accuracy. Normally found when discussion the bias-variance trade-off. Seems to be more intuitive to me. Maybe we can have an equivalent error graph on the opposite side of the accuracy graph?
-
appears more as a slightly more interpretable
appears as a slightly more interpretable (remove first more)
-
sizes of million parameters
can have sizes in the millions parameters can have millions of parameters
-
Therefore, a better solution might be to force a particular random seed before evaluating
Don't understand. What is the random seed used for? Selecting drop-out nodes whose back prop will be checked?
-
If your gradcheck for only ~2 or 3 datapoints then you will almost certainly gradcheck for an entire batch.
Just to confirm: if I am using a batch of 10 data points to update the gradient, I need only 2 to 3 of those data points. And this is true irrespective of the size of the batch?
-
combine the parameters into a single large parameter vector
The documentation talks of weights and parameters. I assume in this case the parameters are the weights. Maybe reinforce this by adding in parenthesis the word weights? Helps us differentiate between the weight matrix and the hyper-parameters.
-
hack the code to remove the data loss contribution.
Maybe it should be: hack the code to remove the regularization loss contribution.
Tags
Annotators
URL
-
-
cs231n.github.io cs231n.github.io
-
U1 = np.random.rand(*H1.shape) < p
How does this work? Does it set all elements of some randomly selected rows of the weight matrix to 0?
-
This is motivated by based on a compromise and an equivalent analysis
Typo/grammar: The motivation for this is based on a compromise and an equivalent analysis
-
This turns out to be a mistake,
I think SGD will also fail for 0 values because any change in the gradient will be multiplied by these zeros and will therefore never change. Is this thinking correct?
-
with proper data normalization it is reasonable to assume that approximately half of the weights will be positive and half of them will be negative
Can anyone explain why?
-
-
cs231n.github.io cs231n.github.io
-
to to
Typo
-
xi scaled
\(x_i\) is scaled
-
Notice that this is the gradient only with respect to the row of W that corresponds to the correct class. For the other rows where j≠yi the gradient is:
I am at a loss here (no pun intended). So for a given class \(y_i\) I only calculate the gradient for all those \(L_i\) that are labelled with \(y_i\). I assume I have to do this for all \(y_i\). So what do I use the expression below for?
TIA.
-
-
cs231n.github.io cs231n.github.io
-
The final loss for this example is 1.58 for the SVM and 0.452 for the Softmax classifier
The figure above has a value of 1.04 for the softmax case. I think that should be \(0.452\).
-
-
cs231n.github.io cs231n.github.io
-
The synapses are not just a single weight a complex non-linear dynamical system
Typo. Grammar.
The synapses are not just a single weight ,but a complex non-linear dynamical system
-
noone
The usual spelling is "no one" and to a lesser extent "no-one". Just nit-picking. B-)
-
dropou
typo: drop-out
-