Reviewer #3 (Public Review):
My general assessment of the paper is that the analyses done after they find the model are exemplary and show some interesting results. However, the method they use to find the number of states (Calinski-Harabasz score instead of log-likelihood), the model they use generally (HMM), and the fact that they don't show how they find the number of states on HCP, with the Schaeffer atlas, and do not report their R^2 on a test set is a little concerning. I don't think this perse impedes their results, but it is something that they can improve. They argue that the states they find align with long-standing ideas about the functional organization of the brain and align with other research, but they can improve their selection for their model.
Strengths:
- Use multiple datasets, multiple ROIs, and multiple analyses to validate their results<br /> - Figures are convincing in the sense that patterns clearly synchronize between participants<br /> - Authors select the number of states using the optimal model fit (although this turns out to be a little more questionable due to what they quantify as 'optimal model fit')<br /> - Replication with Schaeffer atlas makes results more convincing<br /> - The analyses around the fact that the base state acts as a flexible hub are well done and well explained<br /> - Their comparison of synchrony is well-done and comparing it to resting-state, which does not have any significant synchrony among participants is obvious, but still good to compare against.<br /> - Their results with respect to similar narrative engagement being correlated with similar neural state dynamics are well done and interesting.<br /> - Their results on event boundaries are compelling and well done. However, I do not find their Chang et al. results convincing (Figure 4B), it could just be because it is a different medium that explains differences in DMN response, but to me, it seems like these are just altogether different patterns that can not 100% be explained by their method/results.<br /> - Their results that when there is no event, transition into the DMN state comes from the base state is 50% is interesting and a strong result. However, it is unclear if this is just for the sitcom or also for Chang et al.'s data.<br /> - The involvement of the base state as being highly engaged during the comedy sitcom and the movie are interesting results that warrant further study into the base state theory they pose in this work.<br /> - It is good that they make sure SM states are not just because of head motion (P 12).<br /> - Their comparison between functional gradient and neural states is good, and their results are generally well-supported, intuitive, and interesting enough to warrant further research into them. Their findings on the context-specificity of their DMN and DAN state are interesting and relate well to the antagonistic relationship in resting-state data.
Weaknesses:
- Authors should train the model on part of the data and validate on another<br /> - Comparison with just PCA/functional gradients is weak in establishing whether HMMs are good models of the timeseries. Especially given that the HMM does not explain a lot of variance in the signal (~0.5 R^2 for only 27 brain regions) for PCA. I think they don't report their own R^2 of the timeseries<br /> - Authors do not specify whether they also did cross-validation for the HCP dataset to find 4 clusters<br /> - One of their main contributions is the base state but the correlation between the base state in their Song dataset and the HCP dataset is only 0.399<br /> - Figure 1B: Parcellation is quite big but there seems to be a gradient within regions<br /> - Figure 1D: Why are the DMNs further apart between SONG and HCP than the other states<br /> - Page 5 paragraph starting at L25: Their hypothesis that functional gradients explain large variance in neural dynamics needs to be explained more, is non-trivial especially because their R^2 scores are so low (Fig 1. Supplement 8) for PCA<br /> - Generally, I do not find the PCA analysis convincing and believe they should also compare to something like ICA or a different model of dynamics. They do not explain their reasoning behind assuming an HMM, which is an extremely simplified idea of brain dynamics meaning they only change based on the previous state.<br /> - For the 25- ROI replication it seems like they again do not try multiple K values for the number of states to validate that 4 states are in fact the correct number.<br /> - Fig 2B: Colorbar goes from -0.05 to 0.05 but values are up to 0.87<br /> - P 16 L4 near-critical, authors need to be more specific in their terminology here especially since they talk about dynamic systems, where near-criticality has a specific definition. It is unclear which definition they are looking for here.<br /> - P16 L13-L17 unnecessary<br /> - I think this paper is solid, but my main issue is with using an HMM, never explaining why, not showing inference results on test data, not reporting an R^2 score for it, and not comparing it to other models. Secondly, they use the Calinski-Harabasz score to determine the number of states, but not the log-likelihood of the fit. This clearly creates a bias in what types of states you will find, namely states that are far away from each other, which likely also leads to the functional gradient and PCA results they have. Where they specifically talk about how their states are far away from each other in the functional gradient space and correlated to (orthogonal) components. It is completely unclear to me why they used this measure because it also seems to be one of many scores you could use with respect to clustering (with potentially different results), and even odd in the presence of a log-likelihood fit to the data and with the model they use (which does not perform clustering).<br /> - Grammatical error: P24 L29 rendering seems to have gone wrong
Questions:
- Comment on subject differences, it seems like they potentially found group dynamics based on stimuli, but interesting to see individual differences in large-scale dynamics, and do they believe the states they find mostly explain global linear dynamics?<br /> - P19 L40 why did the authors interpolate incorrect or no-responses for the gradCPT runs? It seems more logical to correct their results for these responses or to throw them out since interpolation can induce huge biases in these cases because the data is likely not missing at completely random.

