Reviewer #1 (Public Review):
Human and animal work over the last couple of years established that fluctuations in pupil size track the activity of a number of neuromodulatory nuclei, including the noradrenergic locus coeruleus, cholinergic basal forebrain, serotonergic dorsal raphe and perhaps the dopaminergic midbrain. In other words, pupil size fluctuations might track a "cocktail" of neuromodulators. The current paper leverages sophisticated data driven analysis techniques to show that pupil size changes can indeed be modulated by different combinations of subcortical nuclei. Doing so, the paper helps laying a solid and nuanced neurophysiological foundation for the interpretation of results from cognitive pupillometry, an area of neuroscience and psychology that is rapidly expanding over the past years. I do have a couple of concerns.
Major issues:
The BOLD hemodynamic response function is slower than the pupil impulse response function. It seems that the authors did not correct for the "lag" between the two (as in Yellin et al., 2015, for example). How much does this matter for the results?
Baseline pupil size was different between the identified clusters. How was pupil size normalized across rats and scanning runs, so that we can meaningfully interpret such a difference?
A substantial part of the literature focuses on the relationship between task-evoked pupil and neuromodulatory responses. I understand that this paper describes results from a resting state experiment, but even in these conditions one typically observes rapid dilations. Right now, it seems that the analysis is somewhat blind to these. See for example Fig. 2C in which frequencies are plotted only until 0.05Hz. Can we see this on log-log axes, to inspect the higher frequencies? Note that there is some work that indicates that the slower pupil fluctuations more reliably track ACh signaling, and faster fluctuations more reliably track NE signaling (Reimer & McGinley et al., 2016).
The authors write "Cluster 2 had the strongest positive weights in [...], but also in brainstem arousal-regulating locus coeruleus, laterodorsal tegmental and parabrachial nuclei." However, the voxel size is very large with respect to the size subcortical nuclei. Because of this, here and in other places, I think the authors should use locus coeruleus region or area, to indicate that their voxel captures more tissue than just LC proper. A discussion paragraph on the spatial specificity of their effects would also help.
The approach is very data driven and the Results section mostly descriptive. I'm personally not at all unsympathetic to this approach, but I do think the authors could aid the reader better by briefly interpreting their results already in the Results section. Related, the authors end each paragraph with "These results verified [...]" or "These results highlight [...]"; however they don't explicitly inform us how.
Rainbow and jet colormaps are confusing because they are not perceptually uniform (https://colorcet.holoviz.org/). Please consider using something like "coolwarm"?
Minor issues:
"Trial" is not well defined. I take this is a 15 minute run?
How many trials in each cluster (Fig. 2)? 
It would be nice to see a more zoomed in version of Fig. 5 so that we can actually see the subcortical regions in more detail.