Reviewer #1 (Public review):
Summary:
This paper leverages 7T fMRI data from the Natural Scenes Dataset to investigate whether retinotopic coding, the position-selective organization of visual response structures, spontaneous resting-state interactions between the Default Network (DN) and the Dorsal Attention Network (dATN). Using individualized network parcellations and population receptive field (pRF) modeling, the authors show that DN voxels can be split into two subpopulations based on their response to visual stimulation: those with position-specific positive BOLD responses (+pRFs) and those with position-specific negative BOLD responses (-pRFs). Critically, these subpopulations relate differently to the dATN during rest: -pRFs are anticorrelated with the dATN, +pRFs are positively correlated, and non-retinotopic DN voxels show no coupling. The anticorrelation (and positive correlation) is enhanced when DN and dATN voxels share visual field preferences. An event-triggered analysis suggests that retinotopic coding shapes both "top-down" (DN-initiated) and "bottom-up" (dATN-initiated) spontaneous activity transients, supporting the claim that the retinotopic scaffold is intrinsic to the DN. These findings challenge the prevailing view of global DN-dATN antagonism and suggest retinotopic coding as an organizing principle for cross-network communication.
Strengths:
The central finding that what looks like network-level independence between DN and dATN decomposes into structured, bivalent interactions organized by voxel-level visual field preferences is a compelling demonstration that macro-scale network descriptions can hide meaningful substructure. The logic of the analysis is clean: pRF properties are estimated from retinotopic mapping data and then used to predict resting-state coupling in completely independent scanning sessions. This cross-session, cross-modality design rules out many circularity concerns.
The use of individualized multi-session hierarchical Bayesian parcellation (Kong et al.) to define DN and dATN boundaries within each subject is the right methodological choice for this question. Network boundaries in posterior cortex, where DN and dATN interdigitate most closely, vary considerably across individuals, and group-average approaches would introduce exactly the kind of misassignment that would most confound the result.
The matched-vs-random pRF analysis is well-controlled. The authors demonstrate that cortical distance between matched and randomly-matched dATN pRFs does not differ, effectively ruling out spatial proximity on the cortical surface as a confound. tSNR controls further show that signal quality differences do not drive the effect.
The event-triggered analysis (Figure 3) is creative and adds genuine value. Showing that retinotopically-specific coupling persists during DN-initiated activity transients, not only dATN-initiated ones, is the key piece of evidence for the claim that the code is intrinsic to the DN rather than passively inherited through bottom-up visual drive.
The result is observed consistently across all individual participants, which provides strong evidence for the robustness of the qualitative pattern despite the small sample size inherent to densely-sampled designs.
Weaknesses
(1) The nature of negative pRFs requires more scrutiny
The entire interpretive framework depends on treating negative pRFs in the DN as genuine position-selective neural responses (suppression). However, negative BOLD signals are well known to arise from non-neural sources, specifically, vascular stealing (where activation in nearby tissue diverts blood from adjacent voxels) and macrovascular draining vein effects that produce spatially displaced signal inversions. These concerns are amplified at 7T, where T2*-weighted GE-EPI carries substantial macrovascular weighting. The DN and dATN interdigitate extensively in the posterior cortex, often within millimeters. A negative pRF in a DN voxel adjacent to a positive dATN voxel could, in principle, reflect the hemodynamic shadow of its neighbor rather than an independent neural response.
The spatial dispersion control (matched vs. random pRFs have similar cortical distribution) is valuable but addresses long-range confounds, not *local* hemodynamic crosstalk. The reliability of sign and center position across runs is reassuring but does not exclude a vascular origin, as vascular architecture is itself stable across sessions. I would encourage the authors to test whether the matched-vs-random effect survives exclusion of voxels near large pial vessels (identifiable from T2* contrast or the venograms available in the NSD). These analyses would not be dispositive, but they would meaningfully strengthen the neural interpretation.
(2) Amount of retinotopic mapping data and choice of pRF pipeline
The NSD includes 6 runs of retinotopic mapping (~5 minutes each; 3 bar-aperture, 3 wedge/ring). The authors use only the 3 bar-aperture runs (~15 minutes total per subject) and fit their own pRFs using AFNI's 3dNLfim procedure, rather than using the pRF estimates provided as part of the NSD release (which were fitted using the analyzePRF toolbox with all 6 runs).
Fifteen minutes of bar data is quite limited for reliable voxel-wise pRF estimation, especially in regions far from the early visual cortex, where signal-to-noise is inherently lower. Standard recommendations for robust pRF mapping in higher-order regions generally suggest substantially more data. The variance-explained threshold is close to the noise floor by design, meaning that a non-trivial number of the "retinotopic" DN voxels may be poorly estimated. Given that the core analyses depend on both the sign and the center position of these pRFs, the limited data is a significant concern.
The authors do not explain why they chose to re-fit pRFs rather than use the NSD-provided estimates. If the motivation was methodological (e.g., the NSD pRF pipeline does not readily yield signed amplitude, or the bar-only fits were judged more appropriate for detecting negative responses), this should be made explicit. If the NSD-provided pRFs can reproduce the key findings, this would substantially increase confidence in the results. If they cannot, that divergence itself would be important to understand. I would ask the authors to address this choice and, if feasible, to report whether the core results replicate using the NSD-provided pRF estimates and/or whether using all 6 runs of retinotopy data changes the findings.
(3) pRF model adequacy for the Default Network
The isotropic Gaussian pRF model was developed for and validated in early and mid-level visual cortex, where it captures the dominant spatial selectivity of neuronal populations. In DN voxels where the model explains comparatively little variance, it is less clear that the model is capturing the right quantity. Specifically, the negative pRFs could conceivably be described by a model with a dominant suppressive surround (e.g., a difference-of-Gaussians model), in which what appears as a "negative pRF" in the standard model is actually the surround component of a center-surround mechanism whose center is poorly resolved. This distinction matters: a genuine inverted code (negative center response) implies a qualitatively different computation than inherited surround suppression from nearby visual cortex.
The authors should consider discussing why the standard model is sufficient for the questions asked, or ideally, testing whether the sign distinction survives under alternative pRF model specifications.
(4) Interpreting resting-state transients as top-down vs. bottom-up
The event-triggered analysis labels high-amplitude DN pRF activations as "top-down events" and dATN activations as "bottom-up events." This is a reasonable inference given experience-sampling studies showing that rest involves alternation between internal and external attention, but it remains an inference. Without concurrent experience sampling, eye-tracking, or physiological monitoring, we cannot establish that a spontaneous DN transient reflects memory retrieval or internally-directed thought rather than a global arousal fluctuation. Similarly, dATN transients during rest could reflect covert shifts of spatial attention to remembered or imagined locations rather than bottom-up processing per se. I would ask the authors to soften this framing or to discuss what additional data would be needed to validate the top-down/bottom-up attribution.
(5) The "retinotopic code" vs. "visual field bias" distinction
The paper uses the language of a "retinotopic code" throughout and correctly distinguishes this from a "retinotopic map," noting that DN voxels do not form a continuous topographic representation on the cortical surface. This distinction deserves greater emphasis. In vision science, retinotopic maps carry computational significance through their topographic continuity and relationship to cortical wiring. A distributed collection of voxels with coarse visual field preferences but no cortical topography is a fundamentally different organizational feature. Recent reviews have drawn an explicit distinction between *retinotopic maps* and *visual field biases* (Groen, Dekker, Knapen & Silson, TiCS 2022), and the present findings may be more accurately characterized as the latter. Perhaps the authors think that the distinction is merely a signal-to-noise distinction, in which case I would invite them to clearly speak to this interpretation. In any case, this is not a criticism of the findings themselves, but clarity on this point would prevent conflation of two different organizational principles and would help position the work for both the vision and network neuroscience communities.