Dear Authors,
Congratulations on the excellent preprint!
I have a question with regard to the dimensionality reduction step on the square-root transformed sphere. The methodology employs Tangent PCA, which creates a local linearization by projecting points onto the tangent space at the global Fréchet mean. As noted in the text, the Euclidean distance in this tangent plane effectively approximates the geodesic distance for points that are close to the Fréchet mean.
Given this constraint, how does GAIA perform on highly heterogeneous datasets, like whole-organism or maybe cross-tissue atlases, where distinct cell populations might be located very far from a single, global Fréchet mean on the hypersphere? Does the tangent approximation begin to distort the macro-relationships between highly divergent lineages at the edges of the projection, and have you explored the possibility of using multiple local tangent spaces (or something more clever) to preserve global geometry in these extreme cases?
Thank you for sharing this with the community.