 Aug 2016


Code on GitHub: https://github.com/mzhai2/aaai16

 Jul 2016

web.stanford.edu web.stanford.edu16.pdf3

relational meanings
"to capture linguistic regularities as relations between vectors", IMHO

meanings
add full stop

different
typo  difference

 Jun 2016

aclweb.org aclweb.org

Neural Word Embedding Methods

Thus, we basically need to retrain
... in order to achieve what? Statement doesn't seem complete.
Perhaps "when we need lower dimensional embeddings with d = D', we can't obtain them from higher dimensional embeddings with d = D."?
However, it is possible, to a certain extent, to obtain lower dimensional embeddings from higher dimensional ones  e.g. via PCA.

dimension of embedding vectors strongly dependson applications and uses, and is basically determinedbased on the performance and memory space (orcalculation speed) tradeof


arxiv.org arxiv.org

remove secondorder dependencies
What is it meant by this?

it reveals simple underlying structures in complex data sets using analytical solutions from linear algebra

the third definition
The definitions are not numbered. It would be nice to have them numbered.

uiˆuj
These u vectors are orthogonal.

(XTX)ˆvi=liˆvi
Which means that transforming vector v with that matrix gives us a vector with the same direction. Direction does not change after transformation. This is eigenvector.

PCA and in the process, find that PCA is closely related to

subtracting off the mean
Actually, this is a requirement for computing the covariance matrix Cx.

entails
"entails" is an unfortunate choice of words. "implies" / "includes" / "requires" perhaps?

It is evident that the choice ofPdiagonalizesCY
That is, we have found that, by selecting P = E (the set of eigenvectors of Cx), we get what we wanted: the matrix Cy to be a diagonal matrix.

CY
This we want to be a diagonal matrix, which would mean that the matrix Y is decorrelated.

orthonormal matrix

the number ofmeasurement types
That is, the number of features.

judicious
prudent, sensible.

bely
What does "bely" mean?

by a simple algorithm

normalized direction
A vector (direction vector) with norm = 1.

Yisdecorrelated
The features of the output matrix Y are not correlated. Building a covariance matrix for it would yield a diagonal matrix.

variance
Elements on the diagonal of the matrix.

covariance
The offdiagonal elements of the matrix.

large values correspond to interesting structure
Features with high variance. Directions of major spread.

arises from estimation theory

measurement types
aka features

The covariance measures the degree of the linear relationshipbetween two variables

Because one can calculater1fromr2
Because there is a simple (almost linear in our case) relationship between the two variables.

is in meters and ̃xAis ininches.
Again, might be so  but quite ambiguous statement. Since we see a decreasing function on the plot.

nearby
"nearby" would make sense if the rightmost plot of Fig. 3 shows the first diagonal, which it doesn't.
Or perhaps "nearby", but one of the cameras is upside down.
All in all, quite ambiguous statement.

correlated
Pretty image on Wikipedia, it this article about correlation.

Figure 3
The example for redundancy is not (or at least it doesn't seem to be) in the context of the example with the spring and the ball. Since there is no clear separation between the examples, this might be confusing to readers.

multiple sensors record the samedynamic information
More features refer to the same (or almost the same) thing.

bestfit line
But not as in linear regression / ordinary least squares.
Nice animation on stats.

Maximizing the variance (and by assumption the SNR)corresponds to finding the appropriate rotation of the naivebasis
PCA relates to rotation.

he dynamics of interest existalong directions with largest variance and presumably highest SNR

directions with largest variances in ourmeasurement space contain the dynamics of interes
We seek new features (new directions) which best contain the information (variance) of interest.
Amount of variance > amount of information.

rotation and a stretch

how do we get from this data
How to reduce the 6D data set to a 1D data set? How to discover the regularities in the data set and achieve dimensionality reduction?

our measurements might not even be 90o
The features are not orthogonal. Information brought by distinct measurements is overlapping.

nonparametric method
It does not make any assumptions about the distribution of the data.

ball’s position in a threedimensional space
ball's position = a data sample
threedimensional space = the feature space, with 3 x 2 features (because each camera records in 2D). Time dimension not recorded since it is, actually, the index of a data sample.
Some of these features (dimensions) are not necessary (they are redundant).

does not lie along the basis of the recording(xA;yA)butrather along the bestfit line
Ambiguous statement. A "direction" cannot lie along a "basis". Perhaps "basis vectors"?
Also, if "bestfit line" usually refers to a line found via leastsquares regression, which is not the case here (PCA versus linear regression).

largest directionof variance
"direction of largest variance" perhaps?

are a set of new basis vectors
This means that P is an orthogonal matrix.

newrepresentation of that data set
Original data, with a different base.

basis
New basis, right?

Thus our original basis reflects the methodwe measured our data

some orthonormal basis
PCA will uncover a smaller, better, orthonormal basis.

the number of measurement types
That is, the number of features.

72000 of these vectors
The data matrix. We apply PCA on this.

structure
And, hopefully, the structure can be expressed in a lowerdimensional space (1D in our case).

noise
AFAIK PCA works good when noise is Gaussian.

variablex
Unfortunate labelling of variable. x would be time, actually.
To do: don't name the variable, it's not necessary.
