Reviewer #3:
Park et al. present an analysis of how structural connectomes (estimated with diffusion MRI) change from childhood to young adulthood. To characterize the changes, they embed each connectome into a 3-dimensional space using nonlinear dimensionality reduction (and alignment to a template sample), and then perform a range of analyses of the statistics derived from this space (notably, distances to the template centroid, 'eccentricity'). The paper is well written, the data are fantastic, and the analyses are interesting, but I have a range of methodological concerns.
1) Interpretability and Lack of Comparison The authors claim repeatedly that they are "capitalizing on advanced manifold learning techniques". One could imagine an infinite number of papers that take a dataset, use a technique to extract a metric, X (e.g., eccentricity), and then write about the changes in X with some property of interest, Y (e.g., age). Given this set of papers (and the non-independence between the set of possible Xs), the reader ought to be most interested in those Xs that provide the best performance and simplest interpretation, with other papers being redundant. Thus, a nuanced approach to presenting a paper like this is to demonstrate that the metric used represents an advance over alternative, simpler-to-compute, or clearer-to-interpret metrics that already exist. In this paper, however, the authors do not demonstrate the benefits of their particular choice of applying a specific nonlinear dimensionality reduction method using 3 dimensions alignment to a template manifold and then computing an eccentricity metric. For example:
i) Is the nonlinearity required (e.g., does it outperform PCA or MDS)?
ii) Is there something special about picking 3 dimensions to do the eccentricity calculation? Is dimensionality reduction required at all (e.g., would you get similar results by computing eccentricity in the full-dimensional space?)
iii) Does it outperform basic connectome measures (e.g., the simple ones the authors compute)?
There is a clear down-side of how opaque the approach is (and thus difficult to interpret relative to, say, connectivity degree), so one would hope for a correspondingly strong boost in performance. The authors could also do more to develop some intuition for the idea of a low-dimensional connection-pattern-similarity-space, and how to interpret taking Euclidean distances within such a space.
2) Developmental Enrichment Analysis Both in the main text and in the Methods, this is described as "genes were fed into a developmental enrichment analysis". Can some explanation be provided as to what happens between the "feeding in" and what comes out? Without clearly described methods, it is impossible to interpret or critique this component of the paper. If the methodological details are opaque, then the significance of the results could be tested numerically relative to some randomized null inputs being 'fed in' to demonstrate specificity of the tested phenotype.
3) IQ prediction
The predictions seem to be very poor (equality lines, y = x, should be drawn in Fig. 5, to show what perfect predictions would look like; linear regressions are not helpful for a prediction task, and are deceptive of the appropriate MAE computation). The authors do not perform any comparisons in this section (even to a real baseline model like predicted_IQ = mean(training_set_IQ)
). They also do not perform statistical tests (or quote p-values), but nevertheless make a range of claims, including of "significant prediction" or "prediction accuracy was improved", "reemphasize the benefits of incorporating subcortical nodes", etc. All of these claims should be tested relative to rigorous statistics, and comparisons to appropriate baseline/benchmark approaches.
4) Group Connectome Given how much the paper relies on estimating a group structural connectome, it should be visualized and characterized. For example, a basic analysis of the distribution of edge weights and degree, especially as edge weights can vary over orders of magnitude and high weights (more likely to be short distances) may therefore unduly dominate some of the low-dimensional components). The authors may also consider testing robustness performed to alternative ways of estimating the connectome [e.g., Oldham et al. NeuroImage 222, 117252 (2020)] and its group-level summary [e.g., Roberts et al. NeuroImage 145, 1-42 (2016)].
5) Individual Alignment The paper relies on individuals being successfully aligned to the template manifold. Accordingly, some analysis should be performed quantifying how well individuals could be mapped. Presumably some subjects fit very well onto the template, whereas others do not. Is there something interesting about the poorly aligned subjects? Do your results improve when excluding them?