- Aug 2016
-
-
Code on GitHub: https://github.com/mzhai2/aaai16
-
- Jul 2016
-
web.stanford.edu web.stanford.edu16.pdf3
-
relational meanings
"to capture linguistic regularities as relations between vectors", IMHO
-
meanings
add full stop
-
different
typo - difference
-
- Jun 2016
-
aclweb.org aclweb.org
-
Neural Word Embedding Methods
-
Thus, we basically need to re-train
... in order to achieve what? Statement doesn't seem complete.
Perhaps "when we need lower dimensional embeddings with d = D', we can't obtain them from higher dimensional embeddings with d = D."?
However, it is possible, to a certain extent, to obtain lower dimensional embeddings from higher dimensional ones - e.g. via PCA.
-
dimension of embedding vectors strongly dependson applications and uses, and is basically determinedbased on the performance and memory space (orcalculation speed) trade-of
-
-
arxiv.org arxiv.org
-
remove second-order depen-dencies
What is it meant by this?
-
it reveals simple underlying structures in com-plex data sets using analytical solutions from linear algebra
-
the third definition
The definitions are not numbered. It would be nice to have them numbered.
-
uiˆuj
These u vectors are orthogonal.
-
(XTX)ˆvi=liˆvi
Which means that transforming vector v with that matrix gives us a vector with the same direction. Direction does not change after transformation. This is eigenvector.
-
PCA and in the process, find that PCA is closely related to
-
subtract-ing off the mean
Actually, this is a requirement for computing the covariance matrix Cx.
-
entails
"entails" is an unfortunate choice of words. "implies" / "includes" / "requires" perhaps?
-
It is evident that the choice ofPdiagonalizesCY
That is, we have found that, by selecting P = E (the set of eigenvectors of Cx), we get what we wanted: the matrix Cy to be a diagonal matrix.
-
CY
This we want to be a diagonal matrix, which would mean that the matrix Y is decorrelated.
-
orthonormal matrix
-
the number ofmeasurement types
That is, the number of features.
-
ju-dicious
prudent, sensible.
-
bely
What does "bely" mean?
-
by a simple algorithm
-
normalized direction
A vector (direction vector) with norm = 1.
-
Yisdecorrelated
The features of the output matrix Y are not correlated. Building a covariance matrix for it would yield a diagonal matrix.
-
variance
Elements on the diagonal of the matrix.
-
covariance
The off-diagonal elements of the matrix.
-
large values cor-respond to interesting structure
Features with high variance. Directions of major spread.
-
arises from estimation theory
-
measurement types
aka features
-
The covariance measures the degree of the linear relationshipbetween two variables
-
Because one can calculater1fromr2
Because there is a simple (almost linear in our case) relationship between the two variables.
-
is in meters and ̃xAis ininches.
Again, might be so - but quite ambiguous statement. Since we see a decreasing function on the plot.
-
nearby
"nearby" would make sense if the right-most plot of Fig. 3 shows the first diagonal, which it doesn't.
Or perhaps "nearby", but one of the cameras is upside down.
All in all, quite ambiguous statement.
-
correlated
Pretty image on Wikipedia, it this article about correlation.
-
Figure 3
The example for redundancy is not (or at least it doesn't seem to be) in the context of the example with the spring and the ball. Since there is no clear separation between the examples, this might be confusing to readers.
-
multiple sensors record the samedynamic information
More features refer to the same (or almost the same) thing.
-
best-fit line
But not as in linear regression / ordinary least squares.
Nice animation on stats.
-
Maximizing the variance (and by assumption the SNR)corresponds to finding the appropriate rotation of the naivebasis
PCA relates to rotation.
-
he dynamics of interest existalong directions with largest variance and presumably high-est SNR
-
directions with largest variances in ourmeasurement space contain the dynamics of interes
We seek new features (new directions) which best contain the information (variance) of interest.
Amount of variance -> amount of information.
-
rotation and a stretch
-
how do we get from this data
How to reduce the 6D data set to a 1D data set? How to discover the regularities in the data set and achieve dimensionality reduction?
-
our measurements might not even be 90o
The features are not orthogonal. Information brought by distinct measurements is overlapping.
-
non-parametric method
It does not make any assumptions about the distribution of the data.
-
ball’s position in a three-dimensional space
ball's position = a data sample
three-dimensional space = the feature space, with 3 x 2 features (because each camera records in 2D). Time dimension not recorded since it is, actually, the index of a data sample.
Some of these features (dimensions) are not necessary (they are redundant).
-
does not lie along the basis of the recording(xA;yA)butrather along the best-fit line
Ambiguous statement. A "direction" cannot lie along a "basis". Perhaps "basis vectors"?
Also, if "best-fit line" usually refers to a line found via least-squares regression, which is not the case here (PCA versus linear regression).
-
largest directionof variance
"direction of largest variance" perhaps?
-
are a set of new basis vec-tors
This means that P is an orthogonal matrix.
-
newrepresentation of that data set
Original data, with a different base.
-
basis
New basis, right?
-
Thus our original basis reflects the methodwe measured our data
-
some orthonormal basis
PCA will uncover a smaller, better, orthonormal basis.
-
the number of measurement types
That is, the number of features.
-
72000 of these vectors
The data matrix. We apply PCA on this.
-
structure
And, hopefully, the structure can be expressed in a lower-dimensional space (1D in our case).
-
noise
AFAIK PCA works good when noise is Gaussian.
-
variablex
Unfortunate labelling of variable. x would be time, actually.
To do: don't name the variable, it's not necessary.
-