57 Matching Annotations
1. Aug 2016
2. www.aaai.org www.aaai.org

#### URL

3. Jul 2016
4. web.stanford.edu web.stanford.edu
1. relational meanings

"to capture linguistic regularities as relations between vectors", IMHO

2. meanings

3. different

typo - difference

#### URL

5. Jun 2016
6. aclweb.org aclweb.org
1. Neural Word Embedding Methods
2. Thus, we basically need to re-train

... in order to achieve what? Statement doesn't seem complete.

Perhaps "when we need lower dimensional embeddings with d = D', we can't obtain them from higher dimensional embeddings with d = D."?

However, it is possible, to a certain extent, to obtain lower dimensional embeddings from higher dimensional ones - e.g. via PCA.

3. dimension of embedding vectors strongly dependson applications and uses, and is basically determinedbased on the performance and memory space (orcalculation speed) trade-of

#### URL

7. arxiv.org arxiv.org
1. remove second-order depen-dencies

What is it meant by this?

Related question on stats

2. it reveals simple underlying structures in com-plex data sets using analytical solutions from linear algebra
3. the third definition

The definitions are not numbered. It would be nice to have them numbered.

4. uiˆuj

These u vectors are orthogonal.

5. (XTX)ˆvi=liˆvi

Which means that transforming vector v with that matrix gives us a vector with the same direction. Direction does not change after transformation. This is eigenvector.

6. PCA and in the process, find that PCA is closely related to
7. subtract-ing off the mean

Actually, this is a requirement for computing the covariance matrix Cx.

Estimation of covariance matrices

8. entails

"entails" is an unfortunate choice of words. "implies" / "includes" / "requires" perhaps?

9. It is evident that the choice ofPdiagonalizesCY

That is, we have found that, by selecting P = E (the set of eigenvectors of Cx), we get what we wanted: the matrix Cy to be a diagonal matrix.

10. CY

This we want to be a diagonal matrix, which would mean that the matrix Y is decorrelated.

11. orthonormal matrix
12. the number ofmeasurement types

That is, the number of features.

13. ju-dicious

prudent, sensible.

14. bely

What does "bely" mean?

15. by a simple algorithm
16. normalized direction

A vector (direction vector) with norm = 1.

17. Yisdecorrelated

The features of the output matrix Y are not correlated. Building a covariance matrix for it would yield a diagonal matrix.

18. variance

Elements on the diagonal of the matrix.

19. covariance

The off-diagonal elements of the matrix.

20. large values cor-respond to interesting structure

Features with high variance. Directions of major spread.

21. arises from estimation theory
22. measurement types

aka features

23. The covariance measures the degree of the linear relationshipbetween two variables
24. Because one can calculater1fromr2

Because there is a simple (almost linear in our case) relationship between the two variables.

25. is in meters and ̃xAis ininches.

Again, might be so - but quite ambiguous statement. Since we see a decreasing function on the plot.

26. nearby

"nearby" would make sense if the right-most plot of Fig. 3 shows the first diagonal, which it doesn't.

Or perhaps "nearby", but one of the cameras is upside down.

All in all, quite ambiguous statement.

27. correlated

28. Figure 3

The example for redundancy is not (or at least it doesn't seem to be) in the context of the example with the spring and the ball. Since there is no clear separation between the examples, this might be confusing to readers.

29. multiple sensors record the samedynamic information

More features refer to the same (or almost the same) thing.

30. best-fit line

But not as in linear regression / ordinary least squares.

Nice animation on stats.

31. Maximizing the variance (and by assumption the SNR)corresponds to finding the appropriate rotation of the naivebasis

PCA relates to rotation.

32. he dynamics of interest existalong directions with largest variance and presumably high-est SNR
33. directions with largest variances in ourmeasurement space contain the dynamics of interes

We seek new features (new directions) which best contain the information (variance) of interest.

Amount of variance -> amount of information.

34. rotation and a stretch
35. how do we get from this data

How to reduce the 6D data set to a 1D data set? How to discover the regularities in the data set and achieve dimensionality reduction?

36. our measurements might not even be 90o

The features are not orthogonal. Information brought by distinct measurements is overlapping.

37. non-parametric method

It does not make any assumptions about the distribution of the data.

r-tutor, Non-parametric methods

PSU, Non-parametric methods

38. ball’s position in a three-dimensional space

ball's position = a data sample

three-dimensional space = the feature space, with 3 x 2 features (because each camera records in 2D). Time dimension not recorded since it is, actually, the index of a data sample.

Some of these features (dimensions) are not necessary (they are redundant).

39. does not lie along the basis of the recording(xA;yA)butrather along the best-fit line

Ambiguous statement. A "direction" cannot lie along a "basis". Perhaps "basis vectors"?

Also, if "best-fit line" usually refers to a line found via least-squares regression, which is not the case here (PCA versus linear regression).

40. largest directionof variance

"direction of largest variance" perhaps?

41. are a set of new basis vec-tors

This means that P is an orthogonal matrix.

42. newrepresentation of that data set

Original data, with a different base.

43. basis

New basis, right?

44. Thus our original basis reflects the methodwe measured our data
45. some orthonormal basis

PCA will uncover a smaller, better, orthonormal basis.

46. the number of measurement types

That is, the number of features.

47. 72000 of these vectors

The data matrix. We apply PCA on this.

48. structure

And, hopefully, the structure can be expressed in a lower-dimensional space (1D in our case).

49. noise

AFAIK PCA works good when noise is Gaussian.

50. variablex

Unfortunate labelling of variable. x would be time, actually.

To do: don't name the variable, it's not necessary.