 Jun 2016

rasbt.github.io rasbt.github.io

devation
deviation

lenghts
lengths

exlusion
exclusion

reapeated
repeated

reapeated
repeated

perfomance
performance

feeature
Typo

 Mar 2015

cs231n.github.io cs231n.github.io

to beat random search in a carefullychosen intervals.
to beat random search in a carefullychosen intervals.
remove the "a"?

only training for 1 epoch or even less
only training for 1 epoch or even less .. so we check only in several layers of the network or maybe for all of them during the first epoch?

That is, we are generating a random random with a uniform distribution, but then raising it to the power of 10.
That is, we are generating a random random with a uniform distribution, but then raising it to the power of 10.
The word random is repeated?

Tue to the denominator term in the RMSprop update
True to the denominator term in the RMSprop update

the step decay dropout is slightly
remove the word dropout?

theoretical converge guarantees
theoretical convergence guarantees

update has recently
update that has recently

set of parameters
set of weights per network layer?

model capacity
Needs some more explaining? Reference to bias variance tradeoff? Link to VC dimension?

validation/training accuracy
I have usually encountered the use of error instead of accuracy. Normally found when discussion the biasvariance tradeoff. Seems to be more intuitive to me. Maybe we can have an equivalent error graph on the opposite side of the accuracy graph?

appears more as a slightly more interpretable
appears as a slightly more interpretable (remove first more)

sizes of million parameters
can have sizes in the millions parameters can have millions of parameters

Therefore, a better solution might be to force a particular random seed before evaluating
Don't understand. What is the random seed used for? Selecting dropout nodes whose back prop will be checked?

If your gradcheck for only ~2 or 3 datapoints then you will almost certainly gradcheck for an entire batch.
Just to confirm: if I am using a batch of 10 data points to update the gradient, I need only 2 to 3 of those data points. And this is true irrespective of the size of the batch?

combine the parameters into a single large parameter vector
The documentation talks of weights and parameters. I assume in this case the parameters are the weights. Maybe reinforce this by adding in parenthesis the word weights? Helps us differentiate between the weight matrix and the hyperparameters.

hack the code to remove the data loss contribution.
Maybe it should be: hack the code to remove the regularization loss contribution.
Tags
Annotators
URL


cs231n.github.io cs231n.github.io

U1 = np.random.rand(*H1.shape) < p
How does this work? Does it set all elements of some randomly selected rows of the weight matrix to 0?

This is motivated by based on a compromise and an equivalent analysis
Typo/grammar: The motivation for this is based on a compromise and an equivalent analysis

This turns out to be a mistake,
I think SGD will also fail for 0 values because any change in the gradient will be multiplied by these zeros and will therefore never change. Is this thinking correct?

with proper data normalization it is reasonable to assume that approximately half of the weights will be positive and half of them will be negative
Can anyone explain why?


cs231n.github.io cs231n.github.io

to to
Typo

xi scaled
\(x_i\) is scaled

Notice that this is the gradient only with respect to the row of W that corresponds to the correct class. For the other rows where j≠yi the gradient is:
I am at a loss here (no pun intended). So for a given class \(y_i\) I only calculate the gradient for all those \(L_i\) that are labelled with \(y_i\). I assume I have to do this for all \(y_i\). So what do I use the expression below for?
TIA.


cs231n.github.io cs231n.github.io

The final loss for this example is 1.58 for the SVM and 0.452 for the Softmax classifier
The figure above has a value of 1.04 for the softmax case. I think that should be \(0.452\).


cs231n.github.io cs231n.github.io

The synapses are not just a single weight a complex nonlinear dynamical system
Typo. Grammar.
The synapses are not just a single weight ,but a complex nonlinear dynamical system

noone
The usual spelling is "no one" and to a lesser extent "noone". Just nitpicking. B)

dropou
typo: dropout
