1 Matching Annotations
  1. Nov 2018
    1. 三,方法3:MDS

      Multi-dimensional scaling (MDS) and Principla Coordinate Analysis(PCoA) are very similar to PCA, except that instead of converting correlations into a 2-D graph, they convert distance among the samples into a 2-D graph.

      So, in order to do MDS or PCoA, we have to calculate the distance between Cell1 and Cell2, and distance between Cell1 and Cell3...

      • 1 2
      • 1 3
      • 1 4
      • 2 3
      • 2 4
      • 3 4

      One very common way to calculate distance between two things is to calculate the Euclidian distance.

      And once we calculated the distance between every pair of cells, MDS and PCoA would reduce them to a 2-D graph.

      The bad news is that if we used the Euclidean Distance, the graph would be identical to a PCA graph!!

      In other words, clustering based on minimizing the linear distances is the same with maximzing the linear correlations.

      我想这里也就是为什么,李宏毅老师在 t-SNE 课程一开始时说,其他非监督降维算法都只是专注于【如何让·簇内距小·】,而 t-SNE 还考虑了【如何让·簇间距大·】

      也就是说,PCA 的本质(或者叫另一种解释)也只是【找到一种转换函数,他能让原空间中距离近的两点,转换后距离更近】,他压根就没有考虑【簇内or簇外】而是“通杀”所有点。

      The good news is that there are tons of other ways to measure distance!!!

      For example, another way to measure distances between cells is to calcualte between cells is to calculate the average of the absolute value of the log fold changes among genes.

      Finally, we get a plot different from the PCA plot

      A biologist might choose to use log fold change to calculate distance because they are frequently interested in log fold changes among genes...

      But there are lots of distance to choose from...

      1. Manhattan Distance
      2. Hamming Distance
      3. Great Circle Distance

      In summary:

      • PCA creates plots based on correlations among samples;
      • MDS and PCoA create plots based on distances among samples