6,569 Matching Annotations
  1. Sep 2025
    1. Author response:

      We thank the reviewers for their primarily positive comments and the critiques about where the manuscript could be improved. We agree with the vast majority of points raised. In our revised submission, we will:

      • Clarify some of the wording such as “unified mechanism” so that our intended meaning is clear to all readers

      • Completely change figure 2, as we accept the critique that an X-Y plot is not the logical way to present this concept

      • Amend the legends of figures 1 and 3 so that the disease pathways we are attempting to illustrate are clear for all readers

      • Expand on the genetic interactions between humans and TB and cite the manuscripts suggested

      • Add further discussion on multiple disease endotypes, and the immunological events that may lead to these distinct end points, along with how this may inform treatment stratification approaches

      • Extend the discussion about trained immunity

      • Make specific changes to address each of the reviewers’ points in the recommendations to authors

      • In the minority of cases where we feel a change is not necessary, we will justify this in our response to reviews

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations. 

      Strengths: 

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved. 

      Weaknesses: 

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset. 

      Reviewer #2 (Public review): 

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTseltuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role. 

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process. 

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1). 

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response. 

      One surprising result is the clear boundary between assemblies: a rebalancing process occurring in one assembly does not affect the response in another assembly comprising cells tuned to a different frequency. However, this is slightly challenged by the data shown in Figure 3. 

      Figure 3B-left, for example, shows that, compared to controls, non-target 16 kHzpreferring neurons only decrease their response to a 16 kHz pure tone when the cells targeted by the opto stimulation also prefer 16 kHz, but not when the targeted cells prefer 54 kHz. However, the inverse is not entirely true. Again compared to controls, Figure 3B (right) shows that non-target 54 kHz-preferring neurons decrease their response to a 54 kHz pure tone when the targeted cells also prefer 54 kHz; however, they also tend to be inhibited when the targeted cells prefer 16 kHz. 

      The authors suggest this may be due to the partial activation of 54 kHz-preferring cells by 16 kHz tones and propose examining the response of highly selective neurons. The results are shown in Figure 3F. It would have been more logical to show the same results as in Figure 3B, but with the left part restricted to highly 16 kHz-selective cells and the right part to highly 54 kHz-selective cells. However, the authors chose to pool all responses to 16 kHz and 54 kHz tones in every triplet of conditions (control, opto stimulation on 16 kHz-preferring cells and opto stimulation on 54 kHz-preferring cells), which blurs the result of the analysis. 

      We thank reviewers for highlighting the strengths of our work and providing valuable feedback. We further developed our manuscript mainly from Reviewer 2’s point on the overall effect explained as the main result. One of the main reasons why we chose to pool all tone preferring cells instead of highly selective cells was to ensure that the observed effect not necessarily driven by only a small group of neurons but rather that the effect was driven at the population level, especially at a subject level for Figure 3B. While Figure 3F represents how highly selective cells to each frequency play a major role in the effect, we now have added additional results with only highly selective neurons as Supplementary Figure 3. The left panel shows restricting the population to highly selective neurons to 16 kHz and the right panel restricting the population to highly selective neurons to 54 kHz at cell population level to emphasize the result (Supplementary Figure 3). 

      We appreciate an additional raised point by Reviewer 1 regarding the stimulation effect on population coding. Our primary focus in this manuscript was to establish single cell level effects of holographic stimulation, and we believe that population coding analyses would benefit from a more cell-type-specific approach. We plan to pursue such analyses in follow-up studies where cell types can be better identified and linked to network dynamics. 

      Reviewer #1 (Recommendations for the authors): 

      The authors have appropriately addressed my concerns. 

      As this dataset will be of general interest, it would be helpful to include a doi/link to their data repository in the data availability section. 

      Updating the data repository to the institution server is currently in progress. We will provide the correct doi or link as soon as it becomes available. In the meantime, we will ensure to share them with anyone who contacts to us directly. 

      Reviewer #2 (Recommendations for the authors): 

      Many references to Figures have not been updated between the two versions of the manuscript. See lines 107, 128, 297, 321 and 346. 

      We are sorry for the confusion with mislabelled figures. We now have updated all the figure numbers accordingly.

      In the paragraph beginning on line 266, there is no explicit reference to Figure 3C. 

      We now added Figure 3C reference in the main text (line 290). 

      If the new analysis includes 15 FOV for stim on 54 kHz-preferring cells, as indicated in the rebuttal, the corresponding numbers should be corrected in lines 152 and 180. 

      We now updated the number of FOVs accordingly. 

      The added model is not explained well enough. How are the calcium traces simulated? It is difficult to ascertain whether the result shown in Figure 3C is merely a trivial consequence of the hypothesis that suppression is applied to co-tuned neurons or to all neurons. 

      We are sorry for the lack of important details in the explanation of the model. We simulated time-varying sound-evoked calcium transient especially by applying different decay time constant (faster decay for co-tuned neurons and slower decay for non co-tuned neurons) to closely match the real data. More detailed explanation on this is now included in the manuscript (lines 644 – 650). Since our data do not currently allow us to identify specific cell types, we focused on modelling the stronger suppression observed in co-tuned neurons, especially by adapting the stimulation effect of target cells from the real data. In this revision, we now added data showing that ‘Randomly selected cells’ from the two groups (co-tuned or non co-tuned cell groups) did not exhibit any stimulation effect (added column in Figure 3D) to further indicate that suppression specific to co-tuned neurons is the key factor underlying the observed effects in the real data. We hope to build on this work in future studies to identify cell-type-specific effects and their computational roles. 

      Although the rebuttal clearly states that experiments are carried out on awake animals, this information is still missing from the manuscript. 

      We now stated ‘Fully awake animals’ in the experimental procedures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to whole embryo morphology that is used as evidence for convergent extension (CE) defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than Keller explants or actual cell movements in the embryo. 2) The study would benefit from high or super resolution microscopy, since in many cases the differences in protein localization are not very pronounced. 3) The IP and Western analysis data often show subtle differences, and not apparent in some cases. 4) It is not clear how many biological repeats were performed or how and whether statistical analyses were performed. 

      (1) To more objectively assess the convergent extension phenotypes, we developed a Fiji macro to automatically quantify the LWR in various injected Xenopus embryos, as detailed in the Methods section. We acknowledge that a limitation in the current manuscript is how to link our mechanistic model at the molecular level with the actual cellular behavior during convergent extension, and we plan to perform cell biological studies in the future to elucidate the link;

      (2) We have repeated some of the imaging experiments in DMZ explants using a Zeiss LSM 900 confocal equipped with Airyscan2 detector that can increase the resolution to ~100 nm. The new data are in Suppl. Fig. 4, 9, 11, 16;

      (3) We have repeated all IP and western blots at least three times and provided quantification and statistical analyses;

      (4) We have added the information on biological repeats and statistical analyses in all figures and figure legends.

      Reviewer #2 (Public Review):

      The protein localization experiments in animal cap assays are for the most part convincing, but with the caveat that the authors assume that the proteins are acting within the same cell. As Fzd and Vangl2 are thought to localize to opposite cell ends in many contexts, can the authors be sure that the effects they observe are not due to trans interactions? 

      In our previous publication, we provided evidence that Vangl is necessary and sufficient to recruit Dvl to the plasma membrane within the same cell (Figure 3 in 10.1093/hmg/ddx095). In a more recent publication ( 10.1038/s41467-025-57658-0 ), we further elucidated a mechanism through which Dvl oligomerization switches its binding from Vangl to Fz, and determined that Dvl binding to Vangl and Fz are differentially mediated by its PDZ and DEP domain, respectively. In the current manuscript, we also performed co-IP experiment under various conditions to demonstrate binding between Dvl and Vangl. We feel that these evidences together provide a strong argument for our model where Vangl2 acts within the same cell to sequester Dvl from Fz.

      In regards to the Dvl patches induced by Wnt11 (Fig. 3 and Suppl. Fig. 9), we performed separate injection of EGFP- and mSc-tagged Dvl into adjacent blastomeres, and demonstrated that the Wnt11-induced patches arise from symmetrical accumulation of Dvl at contact of two neighboring cells (Suppl. Fig. 9a-c’). This scenario is different from epithelial PCP where Fz/Dvl and Vangl/Pk are asymmetrically accumulated at the contact between two adjacent cells.

      The authors propose a model whereby Vangl2 acts as an adaptor between Dvl and Ror, to first prevent ectopic activation of signaling, and then to relay Dvl to Fzd upon Wnt stimulation. This is based on the observation that Ror2 can be co-IPed with Vangl2 but not Dvl; and secondly that the distribution of Ror2 in membrane patches after Wnt11 stimulation is broader than that of Fzd7/Dvl, while Vangl2 localizes to the edges of these patches. The data for both these points is not wholly convincing. The co-IP of Ror2 and Vangl2 is very weak, and the input of Dvl into the same experiment is very low, so any direct interaction could have been missed. Secondly, the broader distribution of Ror2 in membrane patches is very subtle, and further analysis would be needed to firm up this conclusion. 

      (1) We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b). Whereas this data confirms our previous conclusion, we acknowledge that a negative data does not fully exclude the possibility for direct biding between Ror and Dvl.

      (2) We re-analyzed the signal intensity of Dvl and Ror in Wnt11-induced patches. By quantifying the intensity ratio between Ror and Dvl along the patches, we found an increase over two folds at the border of the patches (Fig. 7j, bottom panel). We interpret this data to suggest that Ror is accumulated to a higher level than Dvl at the patch borders.     

      A final caveat to these experiments is that in the animal cap assays, loss of function and gain of function both cause convergence and extension defects, so any genetic interactions need to be treated with caution i.e. two injected factors enhancing a phenotype does not imply they act in the same direction in a pathway, in particular as there are both cis/trans and positive/negative feedbacks between the PCP proteins. 

      We agree with the reviewer that a difficulty in studying PCP/ non-canonical signaling is that both loss and gain of function of any its components can cause convergence and extension defects. Genetic interactions, especially synergistic interactions, should be interpreted with caution. But we do want to point out that, in a number of case, we were also able to demonstrate epistasis. For instance, we found that Dvl2 over-expression induced CE defects can be rescued by Pk over-expression (Fig. 1e and f), whereas Vangl/ Pk co-injection induced severe CE defects can be reciprocally rescued by Dvl2 over-expression (Fig. 1g). Likewise, we showed that Fz2/ Dvl2 co-injection induced CE defects can be rescued by wild-type Vangl2 but not Vangl2 RH mutant (Suppl. Fig. 6b), and Ror2 can rescue Vangl2 overexpression induced CE defect (Suppl. Fig. 14). Collectively, these functional interaction data consistently demonstrate an antagonism between Dvl/ Fz/ Ror2 and Vangl2/ Pk, which is correlated with our imaging and biochemical studies.

      As you can see from the reviews, the referees generally agree that your paper is a potentially valuable contribution to the field. Your observations are important because of the novel model based on the inhibitory feedback regulation between planar cell polarity (PCP) protein complexes. However, the reviewers also stated that the model is only partly supported by data because of insufficient clarity and missing controls in several experiments supporting the proposed model. The paper would be significantly improved if your conclusions are backed up by additional experimentation. Specifically, the referees wanted to see the reproducibility of the results shown in Figures 3, 4, 8, S3, S7, S12. 

      We hope that you are able to revise the paper along the lines suggested by the referees to increase the impact of your study on the current understanding of PCP signaling mechanisms. 

      We thank the reviewers for careful reading of our manuscript and for their constructive critiques and suggestions. We have repeated the animal cap studies in original Figures 3, 4, 8 and S3 with DMZ explants, and the new data are in Supplementary Fig. 9, 11, 16 and 4, respectively. We also repeated the biochemical studies in original Figure S 7and 12, and the new data are in Supplementary Fig. 8 and 15.

      Reviewer #1 (Recommendations For The Authors):

      Major points:(1) The author conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). To validate the model proposing that 'non-canonical Wnt induces Dvl to transition from Vangl to Fz, while PK inhibits this transition, and they function synergistically with Vangl to suppress Dvl during Convergent Extension (CE),' it is crucial to assess the subcellular localization of PCP core proteins in dorsal marginal zone (DMZ) cells, which are known to undergo CE. Notably, the overexpression of Wnt11 alone, as employed by the author, does not induce animal cap elongation. Therefore, the use of animal cap explants may not be sufficient to substantiate the model during Convergent Extension (CE). Indeed, previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants. However, the results presented in this manuscript appear to differ from this established understanding. Consequently, to provide more robust support for the proposed model, it is advisable to replicate the key experiments (Figures 3, 4, 8, and Figure S3) using DMZ explants. 

      We repeated the experiments in Figure 3, 4, 8 and Figure S3 with DMZ explant and the new data are in new Supplementary Fig. 9, 11, 16 and 4, respectively.In regards to “previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants”, we are aware Vangl/ Pk localization to the anterior cell cortex in neural epithelium from the studies by the Sokol and Wallingford labs, but are not aware of similar reports in DMZ explants. When we examined the localization of small amount of injected EGFP-mPk2 (0.1 ng mRNA) in DMZ explants, we saw a somewhat uniform distribution on the plasma membrane (Suppl. Fig. 4). In addition, in a related recent publication, we examined endogenous XVangl2 protein localization in activin induced animal cap explants that do undergo CE. What we observed was that whereas low level injected Dvl2 and Fz form clusters on the plasma member, endogenous XVangl2 remains uniformly distributed on the plasma membrane (Suppl. Fig. 3S-Z in 10.1038/s41467-025-57658-0 ). These observations may suggest potential differences of PCP protein localization during neural vs. mesodermal convergence and extension.

      (2) The author suggests that 'Vangl2 and Pk together synergistically disrupt Fz7-Dvl2 patches.' As shown in Figure 4 (panels J' to I'), it is evident that the co-expression of Pk and Vangl2 increases Fz7 endocytosis. Nevertheless, a significant amount of Fz7 still co-localizes with Dvl2. To strengthen the author's hypothesis, additional clear assay is required such as Fluorescence resonance energy transfer (FRET) assay. 

      We appreciate this valuable advice. Since none of the tagged Fz/ Dvl/ Vangl proteins we had were suitable for FRET, we made proteins tagged with mClover and mRuby2, which were reported as optimized FRET pairs. But in our hands mRuby2 seems to require very long time (~2 days) to mature and become detectable at room temperature, and is not suitable for our Xenopus experiments. We are in the process of establishing a luciferase based NanoBiT system to detect Fz-Dvl and Dvl-Vangl interactions in live cells and cell lysates, and will use it in future studies to investigate their interaction dynamics.

      For the current manuscript, we reason that a substantial reduction of Fz7-Dvl2 clusters with Vangl2/ Pk co-injection would still support our idea that Vangl2 and Pk act synergistically to sequester Dvl from Fz to prevent their clustering in response to non-canonical Wnt ligands.

      (3) The IP data is less clear and evident. A couple of examples are: a) Fig 2g where the authors report that the Vangl2 R177H variant reduced Vangl2 interaction with Pk and recruitment of Pk to the plasma membrane, but it appears that the variant interacts slightly better than WT Vangl2 with Pk. In Fig. S7a, the authors state that Pk overexpression can indeed significantly reduce Wnt11-induced dissociation of EGFP-Vangl2 and Flag-Dvl2 in the DMZ. However, there is a minimal impact when compared to the Wnt11 absent control. Based on the results presented in Fig S12a the authors indicate that Wnt11 reduces the association between Vangl2 and Dvl2, which can be discerned, but loss of Ror2 does not change this in any obvious way - but the authors indicate it does. In S12b, the authors have suggested that Ror and Dvl do not form a direct binding interaction. However, the interpretation of Figure S12b is not entirely convincing due to several issues. Notably, the expression levels of each protein appear inconsistent, the bands are not sufficiently clear, and there is the detection of three different tag proteins on a single blot. To strengthen the validity of these findings, it is advisable to repeat this experiment with improved quality. 

      We repeated all the co-IP and western blot analyses pointed out by the reviewer, and performed quantification and statistical analyses.

      Fig 2g had a mistake in the labeling and is replaced with new Figure 2g;

      Fig. S7a is replaced by new data in Supplementary Figure 8a and b;

      Fig. S12a and 12b are replaced by new data in Supplementary Figure 15a, a’ and b, respectively. In 15a and a’, we noticed a consistent decrease of Dvl2-Vangl2 co-IP in Xror2 morphant. The reason for this is not yet clear and will need further study in the future.

      Minor points: (1) In all the whole embryo injection assays examining morphology, no Western analysis is performed to show roughly equivalent and appropriate levels of the various proteins are being expressed. Differences will affect the data. 

      Although we did not do western analyses to examine the protein levels in various functional interaction assays, we did examine how co-expression of Vangl2, mPk2 or Dvl2 may impact each other’s protein levels in Supplementary Fig. 2, which did not reveal any significant change when co-injected in different combination.

      (2) The author's prior publication (Bimodal regulation of Dishevelled function by Vangl2 during morphogenesis, Hum Mol Genet. 2017) presented clear evidence of Vangl2 overexpression inducing Dvl2 membrane localization. However, Figure S4 in the current manuscript did not provide clear evidence of membrane localization. To strengthen the hypothesis that Vangl2-RH mutant also induces Dvl2 membrane localization, further comprehensive imaging analysis is needed. 

      We re-analyzed the imaging data and replaced old Figure S4 with a new Supplementary Fig. 5.

      (3) In Supplementary Figure 9, the authors propose that the overexpression of Vangl2/Pk induces Fz7 endocytosis, as indicated by its co-localization with FM4-64. However, it raises a question: how does the Fz7-GFP protein internalize into the cells without endocytosis, as seen in Figures S9a-c'? To enhance readers' understanding, a discussion addressing this point should be included. 

      We think that this might be a technical issue. As detailed in the Method section, we only incubated the embryos transiently with FM4-64 for 30 minutes, and the embryos were subsequently washed and dissected in 0.1X MMR without the dye. Therefore, only the Fz7-GFP protein endocytosed during the 30 minute-incubation would be labeled by FM-64, whereas that endocytosed before or after the incubation would not. Alternatively, the very few Fz7-GFP puncta occasionally observed in the absence of Vangl2/Pk overexpression could be vesicles trafficking to the plasma membrane.

      (4) Statistical analyses are absent for several results, including those in Figure 2f, Figure S4d, and Figure S7b. 

      We repeated these experiments and included statistical analyses. The new data are in Figure 2f, Supplementary Fig. 5d and Supplementary Fig. 8b.

      (5) This manuscript lacks any results regarding Ck1. Therefore, it is advisable to consider removing the discussion or mention of CK1. 

      We agree, and tune down the discussion on CK1 and removed CK1 from our model in Fig. 9.

      Reviewer #2 (Recommendations For The Authors):

      (1) In all the convergence and extension assays, the authors should report n numbers (i.e. number of animals), what statistical test is used, and what the error bars show. Ideally dot-plots would be used instead of bar charts as they give a better insight into the data distribution. It might be useful to give a section on the statistical analyses used in the M&M, including e.g. any power calculations carried out, as now required by many journals. 

      We have follow the advice to use dot-plots for all the quantification analyses in the manuscript. We include in the figure legends the statistical test used and what the error bars show. The number of embryos analyzed were included in each panel in the figures. We also provided more details in the Methods section on how the LWR quantification was carried out.

      (2) I think Figure 2g is wrongly labelled? FLAG bands are in all three lanes in the western blot, but not labelled as such in the schematic. 

      We corrected the schematic labeling in Figure 2g, and thank the reviewer for catching this mistake.

      (3) In Figure S7, the authors show that co-IP of Dvl and Vangl2 is reduced by Wnt11 and the effects of Wnt are blocked by Pk. Does Pk have any effect in the absence of Wnt? 

      We examined the effect of Pk over-expression on Dvl2-Vangl2 co-IP as advised, and did not see a significant impact in the absence of Wnt11 co-injection. The data is included in the new Supplementary Figure 8a. We interpret the data to suggest that “at least under the condition of our co-IP experiment, Pk may not directly impact the steady-state binding between Vangl and Dvl”.

      (4) In Figure 3, the authors show (as published previously) that Wnt11 induces patches of Dvl at the plasma membrane. It would be useful to see Dvl in the absence of Wnt and Vangl2/Dvl in the absence of Wnt. 

      Dvl is widely known as a cytoplasmic protein and its localization has been published by many labs over the past 20-30 years. In our recent publication (10.1038/s41467-025-57658-0 ), we also re-examined Dvl localization when injected at various dosages. So we did not feel it was necessary to show its localization in the absence of Wnt11 again, but included a reference to our prior publication. In regards to Vangl/Dvl distribution in the absence of Wnt11, the readers can see Suppl. Fig. 5b as an example, in addition to our previous publications referenced in the manuscript.

      (5) In the review figures, the difference in Fz7-GFP patch formation in d' and e' (vs e.g. a') is not very clear. Could the images be improved or (better) quantified in some way? 

      We assume that “review figures” refer to Figure 3 or 4? If so, we felt that Fz7-GFP patch formation was clear in Fig. 3d’, e’ or Fig. 4d’, e’. Nevertheless, we repeated these experiments in DMZ explants as advised by Reviewer 1, and additional examples of Fz7-EGFP patch formation can be seen in the new Suppl. Fig. 9d-f’ and Suppl. Fig. 11d-f’.

      (6) In Figure 6d, I'm concerned that the loss of flag-Dvl2 might occur via dephosphorylation in the IP reaction. Also the M&M don't include methodological details about buffers and whether phosphatase inhibitors were used. A compelling control would be anti-FLAG pulldown showing retention of phosphorylation. Also Figure 6f shows a reduced ratio of fast-to-slow migrating bands of Dvl with Vangl2/Pk - unless I have misunderstood, is this ratio the wrong way round? 

      We added co-IP buffer and protease inhibitor information in Methods.

      We agree that the concern about dephosphorylation during IP reaction is valid, and that direct pull down of Dvl to show the phosphorylated form is a compelling control. We therefore note that in Suppl. Fig. 8a and 15b, direct pull down of Flag-Dvl or Myc-Dvl (with anti-Flag or anti-Myc) did show the slower migrating, phosphorylated form. Additional examples in which Vangl only co-IP the faster migrating unphosphorylated Dvl include Suppl. Fig. 15a, and in a related paper we published recently (Fig. 3R and R’ in 10.1038/s41467-025-57658-0 ).

      Finally, we did wrongly label Figure 6f in the last submission, and the ratio should have been “slow/fast”. We have made the correction, and appreaicte the reviewer for the meticulousness in perusing our manuscript.

      (7) In Figure 7, what does Ror2 look like in the absence of Wnt11? 

      We included new Figure 7a-c to show that without Wnt11 co-injection, Ror2 is uniformly distributed on the plasma membrane.

      (8) Also in Figure 7, Ror2 patches are said to be slightly wider than Dvl2 patches "reminiscent of Vangl2" - I wouldn't describe them as being similar. Vangl2 shows a distinct dip in the center of the Dvl patches, Ror2 does not show a dip, and is only (at best) in a slightly wider patch, and I would want to see further examples to be convinced that the localization domain is reproducibly wider. The merge of many samples in 7d may actually be making the distribution harder to see and if the Xror2 and Dvl2 intensities were normalized I'm not sure how different the curves would appear. (i.e. the Xror2 curve looks like a flattened version of the Dvl2 curve). 

      We have added an additional panel in the new Figure 7j to compare the intensity ratio of Ror/ Dvl2 along the patches, and this analysis reveals an over two folds increase of the ratio at the border region. This quantification may make a more convincing argument that at the patch border region, Dvl is diminished whereas Ror2 accumulate with Vangl2. 

      (9) In Figure S12a, the authors suggest Wnt11 induced dissociation of Dvl from Vangl2 (by co-IP), and this is reduced after Ror2 MO. This would be more convincing with replicates and quantitation. 

      We have repeated this experiment with Vangl2 pull down and added quantification. The data is in the new Suppl. Fig. 15a.

      (10) In Figure S12b, the authors suggest Ror2 can co-IP Vangl2 but not Dvl. This is not very convincing, as the Dvl input band is very weak, and the Vangl2 co-IP band is very weak. 

      We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b).

      (11) "Prickle" spelled "Prickel" in the abstract (and abbreviated to "PK" not "Pk" at one place in the abstract and several places in text) 

      We have corrected these typos.

      (12) Quite a lot of interesting observations are in supplemental figures. Normally it might be expected that extra data supporting a conclusion would be in supplemental, but here some of the supplemental data feels like it is more than simply additional evidence. For instance supplemental Figures 2 and 3 feel more than just supplemental (and Supplemental Figure 3 if merged with Figure 2 would make it easier for the reader). Moreover, for example, the description of the results in Figure 2 is punctuated by references to supplemental Figures 4 and 5 that contain key data to support the conclusions, which means the reader has to flick backwards and forwards from place to place in the manuscript to follow the argument. It is of course up to the authors, but in some cases putting supplemental data back into the main figures (for which there is no size or number limit) would increase clarity. 

      These are excellent points; in the resubmitted manuscript we have a total of 24 data figures, and we used 8 as main figures since we felt that they provide the most relevant and conclusive evidence to our model. We will consult the copy editors at eLife on how to arrange the rest as main vs. supporting figures when requesting publication as version of record.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Weaknesses:

      (1) The rationale for the study and its relationship with previous literature is not fully clear from the paper. In particular, there is a very large literature that has already explored the longitudinal effects of different types of training on functional and structural neuroimaging. However, this literature is barely acknowledged in the Introduction, which focuses on cross-sectional studies. Studies like the one by Draganski et al. 2004 are cited but not discussed, and are clumped together with cross-sectional studies, which is confusing. As a reader, it is difficult to understand whether the study was meant to be confirmatory based on previous literature, or whether it fills a specific gap in the literature on longitudinal neuroimaging effects of training interventions.

      We thank the reviewer for these comments and feedback. 

      We want to clarify that through our pre-registered analysis plan, our approach was confirmatory, rather than exploratory (or rather than post-hoc justified.) This confirmatory approach allowed us to critically evaluate the theoretically novel and important hypotheses which tested what no other study like our longitudinal/intervention study proposed or performed previously. We have now clarified this in the introduction. 

      This allowed us to address the following novel theoretical questions: 1) what neural changes, if any, result from an intensive within-participant intervention that improves memory or navigation skills in healthy young adults 2) if such changes occur, what is the degree of neural overlap between the acquisition of these cognitive skills.”

      “We pre-registered three novel and specific hypotheses, which are described in more detail here (https://osf.io/etxvj) ”

      We have also attempted to better separate cross-section and longitudinal studies. Due to space limitations, we have focused on interventional studies that involved gray matter changes that could relevance to either navigation, episodic memory, or the hypothesized time frame we chose for the training. We also note that some of these relevant studies are discussed in more depth in the discussion.

      “Successful cognitive interventions suggest that targeted within-participant cognitive training, even for as little as 1-2 weeks, can result in improvements to specific cognitive functions, including changes in focal gray matter [4,23-27]; but see[28].”

      We have also added some additional citations to relevant cognitive intervention work, although we agree that this is an extensive literature, only a subset of which we are able to capture here:

      “In some instances, interventions may even generalize to areas not explicitly trained but closely related to the training (termed “near transfer”)[29-33].”

      (2.1) The main claim regarding the lack of changes in brain structure seems only partially supported by the analyses provided. The limited whole-brain evidence from structural neuroimaging makes it difficult to confirm whether there is indeed no effect of training. Beyond hippocampal analyses, many whole-brain analyses of both volumetric and diffusion-weighted imaging metrics are only based on coarse ROIs (for example, 34 cortical parcellations for grey matter analyses).

      Although vertex-wise analyses in FreeSurfer are reported, it is unclear what metrics were examined (cortical thickness? area? volume?). 

      We appreciate the reviewer’s thoughtful feedback. We apologize for the lack of clarity in the original manuscript regarding the type of metric used in the vertex-wise analysis. We confirm that these analyses were based on cortical volume, not thickness or area. To clarify this, we have explicitly stated in the revised Methods that the vertex-wise analyses were conducted on cortical volume using FreeSurfer’s mri_glmfit.

      In addition, in response to the concern regarding the coarse nature of the ROI-based analyses, we have re-analyzed the volumetric data using the more fine-grained Destrieux atlas, which contains 148 cortical ROIs (74 per hemisphere), instead of the original, coarser 34-region atlas. These more detailed analyses still revealed no significant volume changes from pre- to post-training in any of the three groups. We believe this provides stronger support for the lack of training-induced volumetric changes outside the medial temporal lobe.

      Relevant revisions have been made to the Results and Methods sections. Below is the updated content added to the manuscript:

      In Results:

      “We also analyzed gray matter volume changes outside of the medial temporal lobe using FreeSurfer (see Methods) to determine if any cortical or other relevant brain areas might have been affected by the training. We applied a vertex-wise analysis of cortical volume, again finding no significant differences across the entire cortex (see Methods). This finding was further validated using the Destrieux atlas, which includes 74 cortical parcellations per hemisphere (148 ROIs in total). Paired-sample t-tests revealed that none of the ROIs exhibited significant volume changes from pre- to post-test in any of the three groups (all ps > 0.542, FDR-corrected). These findings suggest that training did not result in any measurable cortical volumetric changes.”

      In Methods:

      “Whole-brain structural analyses were conducted using FreeSurfer (version 7.4.1; https://surfer.nmr.mgh.harvard.edu). T1-weighted anatomical images were processed using the longitudinal processing pipeline. Vertex-wise analyses of cortical volume were performed using FreeSurfer’s general linear modeling tool, mri_glmfit. Group-level comparisons were corrected for multiple comparisons using mri_glmfit-sim, which implements cluster-wise correction based on Monte Carlo simulations. A vertex-wise threshold of Z > 3.0 (corresponding to p < 0.001, two-sided) was applied to detect both positive and negative effects. Clusters were retained if they survived a cluster-wise corrected p < 0.05.

      In addition to vertex-wise analysis, cortical parcellation was performed using the Destrieux atlas (aparc.a2009s), which includes 74 cortical regions per hemisphere, yielding 148 ROIs in total. To account for variability in brain size, each ROI volume was normalized by estimated intracranial volume (ICV) and scaled by a factor of 100. Longitudinal comparisons were conducted using paired-sample t-tests. To correct for multiple comparisons, we applied FDR correction (q < 0.05).”

      (2.2) Diffusion-weighted imaging seems to focus on whole-tract atlas ROIs, which can be less accurate/sensitive than tractography-defined ROIs or voxel-wise approaches.

      We appreciate the reviewer’s important point regarding diffusion-weighted imaging (DWI) analysis. We focused primarily on atlas-defined tract-level ROIs derived from a standard white matter tract atlas as we did not feel that we had the resolution for more fine-grained analyses with our sequences. While this approach has the advantage of robust anatomical correspondence and improved interpretability, we agree that it may be less sensitive than tractography-defined or voxel-wise methods for detecting more subtle, localized training-related changes. Because of limitations in our DWI sequence, which was optimized to be shorter and identical between different scanners, we are not able to provide more fine-grained analysis of the DWI data.

      (3) Quality control of images is only mentioned for FA images in subject space. Given that most analyses are based on atlas ROIs, visual checks following registration are fundamental and should be described in further detail.

      Thank you for your thoughtful comment. We agree that visual quality control is critical when using atlas-based ROI analyses. In our study, we implemented comprehensive quality control procedures across all structural and functional imaging analyses.

      For hippocampal segmentation using ASHS, we performed manual visual inspections of each participant's subfield segmentation to verify the accuracy of the automated outputs. This is now clearly described in the revised Methods section:

      “Each participant's subfield segmentations were manually inspected to ensure the accuracy and reliability of the segmentation protocol.”

      For FreeSurfer-based hippocampal and cortical segmentation, we also conducted detailed visual inspections and manual edits following the standard FreeSurfer longitudinal pipeline. We have added the following description to the Methods section to clarify this process:

      “Visual quality control was conducted by three trained raters who systematically inspected skull stripping, surface reconstruction, and segmentation accuracy at both the within-subject template and individual timepoints. Manual edits were primarily applied to the within-subject template to correct segmentation errors—particularly in challenging regions such as the hippocampus—since corrections to the template automatically propagate to all timepoints. Raters followed standardized FreeSurfer longitudinal editing guidelines to ensure consistent and reproducible corrections across subjects. Discrepancies were resolved via consensus discussion. This quality control approach enhanced the accuracy and consistency of segmentation across longitudinal scans, thereby improving the reliability of morphometric analyses and atlas-based ROI extractions.”

      For functional MRI preprocessing, all registration steps—including transformations from individual functional runs to MNI space—were visually checked for each participant to ensure accurate alignment with the Schaefer atlas. We have clarified this point in the revised Methods section with the following statement:

      “Prior to ROI extraction, all registration steps—from individual functional space to MNI space—were visually inspected for each participant to confirm accurate alignment between the functional images and the atlas parcellation.”

      These additions now more clearly reflect the robust quality control procedures that were employed throughout our pipeline to ensure the validity of atlas-based analyses.

      Recommendations for the authors:

      (1) As a reader, I would have appreciated a short section in the methods regarding the preregistration and power analysis. Currently, it is not too straightforward to understand which analyses were included in the preregistration, and at what point in the project the pre-registration was written. Finding all the relevant information from OSF is feasible, but it would be more accessible if a summary of the information were available inside the text.

      We thank the reviewer for this valuable suggestion. We agree that providing a concise summary within the manuscript's methods section will significantly improve accessibility for readers. 

      The full preregistration is now explicitly referenced in the Methods:

      Preregistration and Power Analysis

      This study was preregistered on the Open Science Framework (OSF; https://osf.io/etxvj). The preregistration was completed on October 30, 2023, after approximately 80% of data collection had been completed, but prior to any analysis of the primary outcome variables. The preregistration outlines the study hypotheses, design, target sample size, and planned behavioral and neuroimaging analyses, including longitudinal ROI comparisons and statistical correction procedures.

      A priori power analysis was conducted using G*Power 3.1 to estimate the required sample size for detecting a Group × Time interaction in a mixed-design ANOVA. Assuming a small-to-medium effect size (f = 0.35), we determined that 24 participants per group would provide 80% power to detect a significant effect at α = 0.05. To allow for potential attrition and data exclusion (e.g., due to excessive motion or incomplete datasets), we targeted recruitment of 30 participants per group across two study sites.

      All primary hypotheses, analytic plans, and inference criteria are documented in the preregistration. Exploratory analyses are clearly delineated in both the preregistration and the present manuscript.”

      (2) The relevance of the study for "disease" is mentioned in the Abstract but is absent in the Introduction. This may be worth removing?

      Thank you for pointing this out. We agree that the reference to "disease" in the Abstract was not well-supported in the Introduction. To maintain consistency and avoid overstatement, we have removed the mention of "disease" from the Abstract in the revised manuscript.

      In Abstract:

      “Training cognitive skills, such as remembering a list of words or navigating a new city, has important implications for everyday life.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      The correlation between rebound excitation and song structure (e.g., harmonic stack duration) may depend on outliers, such as birds with harmonic stacks >150ms.

      If in wild zebra finch, or even if in domesticated zebra finch including our birds and the birds from the other labs that we evaluated, the distribution of durations of longest harmonic stacks has a long tail, it is not apparent that birds with long duration harmonic stacks are properly considered as outliers. Examining the distribution of motif durations (a less derived statistic) in 33 birds (Fig. 2C) does not support the idea that birds with longer duration songs are outliers. Thus, we view the reviewer question as addressing whether there are different mechanisms operating in birds with long harmonic stacks than for other birds. Unfortunately, the numbers of long-duration harmonic stack birds are too small to give confidence in any statistical analysis of that group. Thus, we limited our re-analysis to the data excluding birds with harmonic stacks >150ms (which is arbitrary), examining how these birds influence our conclusions. We conclude that the influence of the excluded birds on the overall result is modest. The updated results are presented in Supplemental Figure 6, and the Results section has been revised to state:

      “We found that while some of the p values increased above 0.05 (p = 0.058 for rebound area vs. longest harmonic stack and p = 0.082 for sag ratio and longest harmonic stack), it remained significant for firing frequency and longest stack (Pearson’s R, p = 0.0017) and for sag ratio and motif duration (p = 0.024). However, when sag ratio was compared against the duration of the motif excluding the longest harmonic stack, there was no relationship (p = 0.85).”

      There is a disconnect between the physiological measurements and the HH model presented.

      We acknowledge that addressing this limitation would involve additional experimental and modeling assumptions. Rather than overextending our interpretations, we have clarified the limitations of the current study in the Discussion:

      “While this HH model provides a plausible framework for linking intrinsic properties to sequence propagation, it does not fully account for the observed relationship between IPs and song structure. A principal limitation constraining the current model is the absence of information for the same neurons combining characterization of both IPs and network activity during singing (or song playback), when HVC<sub>X</sub> express activity related to song features. Addressing this gap would requires additional and challenging experiments and is beyond the scope of this study.”

      Although disynaptic inhibition between HVC<sub>X</sub> neurons and between HVC<sub>RA</sub> and HVC<sub>X</sub> neurons is well established, I am not aware of any data indicating direct synaptic connections between HVC<sub>X</sub> neurons.

      This is an important theoretical point about the reliance of the intervaldetecting network model on HVC<sub>X</sub> neurons and about how the model would change if many of the HVC<sub>X</sub> were swapped for HVC<sub>RA</sub> neurons. Connections between HVC<sub>RA</sub> neurons to HVC<sub>X</sub> neurons are established, whereas there is relative paucity of evidence for HVC<sub>X</sub> to HVC<sub>X</sub> connectivity. This is based on work from Prather and Mooney, 2005 (among others) which performed paired sharp electrode recordings to characterized connections in HVC. This work found very few HVC<sub>X</sub> - HVC<sub>X</sub> connections. However, if connected HVC<sub>X</sub> neurons are physically more distant from each other than are connected HVC<sub>RA</sub> – HVC<sub>X</sub> neurons, they would more likely be missed in blind paired recordings. Using different approaches, recent results from the Roberts lab (Trusel et al.,eLife,  2025) supports the existence of robust HVC<sub>X</sub>  - HVC<sub>X</sub>  connections.

      Reviewer #2(Public Review):

      The interpretation of p-values is rigid, and near-significant results (e.g., p = 0.06) are dismissed without discussion.

      We revised the text to reflect a more nuanced and consistent interpretation of p-values and updated the reporting to include exact values. For example, the Results section now states:

      "Nonetheless, the longest syllable duration was not significantly correlated with the average sag ratio for each bird (Pearson’s R: R<sup>2</sup> = 0.12, p = 0.065, Supplemental Fig. 2, top left panel), though it is trending toward significance (see Discussion)”

      The conclusion that harmonic stacks influence intrinsic properties lacks necessary controls.

      We have attempted to further clarify that harmonic stacks were used as a representative feature of temporal song structure rather than a unique determinant of intrinsic properties. The Discussion now states:

      “Although harmonic stacks provide a useful test case for studying temporal integration, our findings suggest that IPs are broadly linked to song duration and structure, rather than specific syllable types. This is also consistent with prior results that found all HVC<sub>X</sub> ion currents that were modeled were influenced by song learning[31].”

      The relationship between rebound area and experimentally tutored birds was not fully explored.

      We expanded the analysis to include rebound area in instrumentally tutored birds, which has now been incorporated into Figure 4C. These additional analyses also robustly support our hypotheses. The Results section has been updated to state:

      “We then evaluated the IPs of HVC<sub>X</sub> in the birds from the two groups. HVC<sub>X</sub> neurons from birds who sang unmodified songs (N = 5 birds, 31 neurons), which had shorter harmonic stacks and shorter overall duration, had lower sag ratios (Mann-Whitney: p = 0.025), firing frequency (Mann-Whitney, p = 0.0051) and rebound area (Mann-Whitney: p = 0.0003)”

      Reviewer #3 (Public Review):

      Limited data supports the claim that intrinsic properties influence temporal integration windows.

      While we agree that further data could strengthen this claim, we show that this can happen in principle (Figure 5) but believe that the appropriate experiment to test this requires further experiments in-vivo. We emphasize in the Discussion:

      “Our findings suggest that post-inhibitory rebound excitation in HVC<sub>X</sub> could expand temporal integration. Ultimately, experiments combining in vitro with in vivo recordings can directly quantify this effect. We hope our results motivate such experiments.”

      Technical Corrections

      (1) Fixed typographical errors (e.g., Line 177: corrected "r2 = 4" to "r2 = 0.4").

      (2) Revised figure legends for clarity (e.g., Figure 4E now includes tutoring design details).

      (3) Updated methods to specify how motifs were defined and measured.

      Revised Figures

      Figure 4: Updated to include analysis of rebound area in instrumentally tutored birds, reflecting the relationship between experimental tutoring and intrinsic properties.

      Supplemental Figure 6: Correlation analysis excluding outliers

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes in resistance gene profiles, virulence genes, cell wall biogenesis, and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      The work would further benefit from a more detailed discussion on the limitations due to the lack of data on patient clinical information, ward movement, and swabs collected from healthcare workers to verify the transmission of Pseudomonas aeruginosa ST 621, including potential healthcare worker to patient transmission, patient-to-patient transmission, patient-to-environment transmission, and environment-to-patient transmission. For instance, the definition given in the manuscript for patient-to-patient transmission could not rule out the possibility of the existence of a shared contaminated environment. Equally, as patients were not routinely swabbed, unobserved carriers of Pseudomonas aeruginosa ST 621 could not be identified and the possibility of misclassifying the environment-to-patient transmissions could not be ruled out. Moreover, reporting of changes in rates of resistance to imipenem and cefepime could be improved by showing the exact p-values (perhaps with three decimal places) rather than dichotomising the value at 0.05. By doing so, readers could interpret the strength of the evidence of changes.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      We thank the reviewer for their thorough evaluation of our work, and for the suggested improvements. A main goal of this study was to show that integrating routine wgs in the clinic was a game changer for infection control efforts. We appreciate this aspect was highlighted as a strength by this reviewer. While some of the weaknesses identified are inherent to the data (or lack thereof) available for this study, we have revised the manuscript to include a detailed discussion on limitations (sampling, thresholds of genetic relatedness, definition and categories etc.) that could influence the genomic inferences. We also provided exact p-values for the changes in rates of resistance, as requested. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as the convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison.

      The sampling was based on all organisms sent to the Multidrug-resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating the inference of exact transmission routes. Epidemiological data relevant to this analysis include information on the reason for sampling, patient admission and discharge data, and underlying frequency of sampling and sampling results in relation to patient turnover.

      We thank the reviewer for their thoughtful feedback on our manuscript and for highlighting the quality of the genomic analyses. We agree that the lack of patient epi data (e.g. date of admission and discharge) and the inconsistent sampling through the years are limitations of this study. We have revised the manuscript to acknowledge these limitations and discuss how not having this data complicates the inference of exact transmission routes. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems, and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study overall is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      We thank the reviewer for their evaluation of our work and for highlighting the broad implications of our findings regarding the application of real-time surveillance to suppress nosocomial transmission. We agree with the reviewer that fixed thresholds and definitions are imperfect to classify the origin of an infection. The design of this study (e.g. inconsistent sampling through time) was not conducive to provide a comprehensive/quantitative measurement of transmission routes. Thus, we decided to apply conservative thresholds of genetic relatedness and strict conditions (e.g. time between isolate collection, shared hospital location etc.) to favor specificity as our goal was simply to establish that cases of environmentto-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original fixed-thresholds predictions. This limitation is now discussed in the revised manuscript. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly including the addition of Figure S3.

      Reviewer #1 (Recommendations For The Authors):

      The definitions used on lines 391-396 are necessarily somewhat arbitrary, but it would be helpful to have a little bit more justification for the choices made, particularly for the definition of environmental involving the "3x the number of years they were separated". It seems a little hard to square this with the more relaxed 10 SNP cutoff for a patient-to-patient designation. Are there reasons for thinking SNP differences associated with environmental transmission should be smaller than for patient-to-patient, or is the aim here just to set the bar higher for assuming an environmental source? Because these definitions are quite arbitrary, there could also be some value in exploring the sensitivity of the results to these assumptions.

      Thank you. We agree with the reviewers that SNP thresholds, albeit necessarily, are arbitrary and that more discussion/justification was needed to put the genomic inferences in context. We have revised the manuscript to indicate that: 1/ the 10 SNP cutoff for a patient-to-patient designation was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward at the same time) needed to be established. 2/ the environment-to-patient definition was indeed set to be most conservative (nearly identical isolates in two patients from the same ward with no known temporal overlap for > 365 days). This was indeed done to favor high specificity as this inference relied solely on clinical isolates (i.e. the identical environmental strain in the patientenvironment-patient chain was not sampled). For these clinical isolates to have acquired no/very little mutation in that much time, no/low replication is expected and, although unsampled, we propose this most likely happened on hospital surfaces.

      While the term "core genome" should be familiar to most readers, "shell genome" and "cloud genome" are less widely known, and an explanation of what these terms mean here would be helpful.

      Thank you. We have revised the manuscript to define the core, shell, and cloud genomes as genes sets found in ≥ 99%, ≥ 95% and ≥ 15% of isolates, respectively.

      In the first paragraph of the discussion, it could be added that in many cases for clinically important Gram negatives short read sequencing alone will fail to detect transmission events as outbreaks can be driven by plasmid spread with only very limited clonal spread (see, for example, https://www.nature.com/articles/s41564-021-00879-y )

      Thank you. We agree this is an important/emerging aspect of surveillance. However, the goal of this discussion point was to explain why such a large outbreak was missed prior to implementing WGS (short read) surveillance. We feel that discussing “plasmid outbreaks” (which is not at play here, and relatively rare in P. aeruginosa compared to the Enterobacteriaceae) and the need for long read will distract from the narrative. 

      line 599 What does "Mock" mean here? Would it be more accurate to say it is a simplified floor plan?

      Thank you. “Mock” was changed to “simplified”

      IPAC abbreviation is only used once - spelling it out in full would increase readability.

      Revised manuscript was edited as suggested.

      MHS is only used twice.

      Revised manuscript was edited to spell out Military Health System

      Line 364: full stop missing.

      Revised manuscript was edited as suggested.

      Line 401: Bayesian rather than bayesian.

      Revised manuscript was edited as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for giving me the opportunity to review this interesting manuscript.

      The conclusions of this paper are mostly well supported by the data presented, but epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes.

      Major issues:

      What was the baseline frequency of clinical and/or screening samples of Pseudomonas aeruginosa at the hospital? Neither Figure 1D nor Table S1 allows for differentiating between clinical and screening samples. Most isolates were cultured from clinical materials, and there is no information about the patients' length of stay and their respective sampling dates. Is there any possibility of finding out whether the samples were collected for clinical or screening purposes? Would it be possible to include the patients' admission data to determine whether the strains were imported into the hospital or related to a previous stay, e.g. among known carriers? Also, the issue of sampling dates vs. patient stay on the ward should be addressed, as there may be an overlap in patients' stay on the ward but no overlap in terms of sampling dates or even missing samples (missing links).

      We have revised the manuscript to address this important point: i) 16 isolates were from surveillance swabs and are labelled “Surveillance” in Table S1. The remaining 237 were clinical isolates; ii) unfortunately, because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) and we can not calculate length of stay or better identify patient overlap. These limitations are now acknowledged in the discussion of the revised manuscript.

      In order to evaluate the extent of the outbreak, more epidemiological data would be useful What is the size of the hospital, what is the average patient turnover, and what is the average length of stay in ICU and non-ICU? Is there any specialization besides the military label?

      We have revised the manuscript to indicate that facility A is 425-bed medical center and is the only Level 1 trauma center in the Military Health System. Unfortunately, the data to calculate length of stay, throughout the years, in ICU and non-ICU, was not available to us. This limitation is now also acknowledged in the discussion.

      Perhaps the authors could attempwt to discuss the extent to which large outbreaks like these may be considered as part of unavoidable evolutionary processes within the hospital microbiome as opposed to accumulation and transmission of potentially harmful genes/clones, and differentiate between the putative community spread without any epidemiological links on the one hand, and hospital outbreaks that could be targeted by local infection prevention activities on the other hand.,

      We respectfully disagree with the suggestion that this large outbreak “may be considered as part of unavoidable evolutionary processes within the hospital microbiome” and should be opposed to “transmission of potentially harmful genes/clones”. As a matter of fact, our data showed that infection control staff at Facility A responded with multiple interventions, including closing sinks, replacing tubing, and using foaming detergents. This resulted in slowing the spread of the ST621 outbreak with just 3 cases identified in 2022, 0 cases in 2023 and 1 case in 2024. This is now discussed in the revised manuscript.

      Page 5, lines 88-92 lines 101-104. It seems as if the outbreak was identified only by the means of genomic surveillance. This raises questions as to the rationale for sampling and sequencing, especially prior to 2020. Considering 11 cases per year between 2011 and 2016, one could assume such an outbreak would have been noticed without sequencing data.

      The MRSN was created in 2010, in response to the outbreak of MDR Acinetobacter baumannii in US military personnel returning from Iraq and Afghanistan. Between 2011 and 2017, the MRSN collected MDR isolates (mandate for all MDR ESKAPE but compliance varied between years and facilities) from across the Military Health System and, for select isolates (e.g. high-risk isolates carrying ESBLs or carbapenemases) performed molecular typing by PFGE. In 2017 the MRSN started to perform whole genome sequencing of its entire repository. In 2020, a routine prospective sequencing service was started and first detected the ST621 outbreak. A retrospective analysis of historical isolate genomes (2011-2019) identified additional cases. The first paragraph of the discussion lists possible factors to explain why the ST621 escaped detection by traditional approaches. We believe 11 cases per year is not a strong signal when stratified by month, wards, or both, especially for a clone lacking a carbapenemase and without a remarkable antibiotic susceptibility profile. 

      Did the infection control personnel suspect transmission? If yes, was the sampling and submission of samples to the MRSN adapted based on the epidemiologic findings?

      The ST621 outbreak was unsuspected before the initial genomic detection in 2020. Until that point, MDR isolates only (Magiorakos et al PMID: 21793988) were collected but compliance was variable through time. Quickly thereafter (starting in 2021), complete sampling of all clinical P. aeruginosa (MDR or not) from Facility A was started. The manuscript was revised to clarify those details of the sampling strategy.

      Is there any information about how many environmental sites were sampled without evidence of ST621 / screening samples were cultured without evidence of Pseudomonas aeruginosa?

      For patient isolates, only 16 isolates were from surveillance swabs. The remaining 237 were clinical isolates. No denominator data was available to calculate P. aeruginosa and ST-621 positivity rate in surveillance swabs throughout the time period. For environmental isolates, a total of 159 swabs were taken from 55 distinct locations in 8 wards/units including the ER. This data is now included in the revised manuscript. However, a complete analysis of these swabs (positivity rate for ESKAPE pathogens, P. aeruginosa, per ward/floor/room, per swab type (sink drain, bed rail etc.) etc.) is beyond the scope of this study and is being performed as a follow up investigation.

      Page 5 lines 89 and 39 Figure S1B. Please describe how the allelic distance for the cluster threshold was selected.

      As indicated in the legend of Figure S1B, no thresholds were applied. All ST621 isolates ever sequenced by the MRSN were included. All except 3 isolates shared between 023 cgMLST allelic differences. The remaining 3 were distant by 88-89 allelic differences. The text was revised to clarify this point.

      Page 5 lines 99-100. Could the authors please provide some distribution measures (e.g. IQR).

      Done as requested. The revised manuscript now reads “…of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. 1A, Table S1).”

      Page 5 line 102. Could the authors please provide some distribution measures (e.g. IQR).

      Please see above. A chart was created and is now included as Fig. S2.

      Page 6 line 107 and page 34 figure 1c. In the text it is stated that isolates were collected in 27 wards, the figure 1C depicts 26 wards and n/a.

      Thank you for spotting this inconsistency. This has been fixed in the revised manuscript.

      Page 6 lines 117-118. Samples collected in the emergency room would imply samples collected on admission, already addressed previously. Did the authors investigate a potential import into the hospital from community reservoirs or were all these isolates collected among patients who had been previously admitted to the hospital and/or tested positive for the outbreak strain?

      We agree that samples collected in the ER imply samples collected on admission. Of the 29 ER isolates only 9 (31%) were primary isolates (first detection in a new patient) which suggests a majority were from returning patients at Facility A. Because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) to investigate/confirm that these 9 patients had previous visits at Facility A. This point is now discussed in the revised manuscript.

      Page 6 line 128. This could also represent increased selective pressure. However, according to Table S1, the 28 isolates collected in 2011 (the number does not match with Figure 1D) were from many different wards, thus indicating earlier spread throughout the hospital.

      Yes, we agree. Please note that table S1 lists all isolates for 2011 whereas Figure 1D focuses on primary (first isolate from each patients) only.  

      Page 7 line 133. Both Figure 2 and the discussion section, page 13 line 296 suggest the year 2005 instead of 2004?

      Thank you for catching this typographical error. This was corrected to 2004 in the revised manuscript.

      Figure 1E. The figure should also depict intra-patient diversity for comparison.

      Thank you for this great suggestion. We have revised Figure 1E accordingly.

      Page 7, lines 146-147 Could the authors attempt explaining the upper part of the bimodal peaks?

      This is an all-vs-all SNP analysis for all inter-patient isolates. For each isolates all distances to other isolates are reported, not only the smallest. The upper peaks represent comparisons to isolates from a different outbreak subclone (SC1 vs SC2).

      Page 7, line 150 This is a very small number considering the extent of the outbreak and suggests a large number of missing links. Or does this rather imply continuous import and evolution over time that does not necessarily represent transmission within the hospital?

      We believe all cases were due to transmission happening within the hospital. Based on conservative thresholds (genetic relatedness and epi link, or lack thereof) the precise origin from another patient (n=10) or a contaminated surface (n=12) can be inferred. For the remaining 60 patients, with the available sampling, the conditions we chose are not met and we simply do not conclude whether a direct patient-to-patient or an environmental origin was more likely.

      Page 8 line 155. What does the temporal overlap refer to - sampling date versus patient's stay on the ward? Please specify.

      The temporal overlap was investigated from sampling dates, as dates of patient admission/discharged were not available.

      Page 8, line 157: What does primary/serial isolate mean - first and follow-up samples of ST621 per patient?

      Yes. Primary isolate is used to designate the first isolate from a patient. Serial isolates designate follow-up samples of ST621.

      Page 8 line 165: Table S3 and Figure 3 only refer to environmental samples from three wards. Ward 20 rooms 2 and 18 as well as ward 1 rooms 1 and 6 were hotspots - is there any information on the specific infection control/disinfection measures? Addressed in discussion page 12, lines 273-275, but no information on what was actually done.

      The manuscript was revised to indicate the precise disinfection measures that were taken. A follow-up study is ongoing to assess long-term efficacy and monitor possible retrograde growth from previously contaminated sinks.

      Page 8 line 175: Evaluation of change in resistance fraction over time - There may have been a selection bias with an inconsistent number of strains sequenced per year.

      Yes, incomplete sampling and possible selection bias are now listed with other limitations of this study in the discussion of the revised manuscript.

      Page 9 line 183: The referral to Table S1 is unclear, I could not find the number and the specific isolates selected for long-read sequencing.

      Thank you. This has been added to the revised Table S1.

      Page 10 lines 217-225 and Figure 4C: Perhaps it is possible to better align what is written in the text and the caption of the figure. The caption does not clarify that only one patient develops colistin resistance (what was the reason to include the other patients?).

      Thank you. We have revised the text and the caption of the figure to clarify that only isolates from one patient developed colistin resistance. The isolates from the other patients on Fig. 4C are shown to provide context and accurately map the emergence of the PhoQE77fs mutation.  

      Page 10, lines 228-229 and Table S5: How is it possible to identify those 64 genes in Table S5?

      We have revised Table S5 to facilitate the identification of the 64 genes with ≥ 2 independently acquired mutations (excluding SYN). Specifically, we have added column E labeled “Counts independent mutations per locus (excluding SYN)”. A total of 205 rows (in this table each row is a variant) have a value ≥ 2 and these represent 64 genes (upon deduplication of locus tags).  

      Page 13, lines 280-281: Where is the information on chronic infection presented? Serial cultures would not necessarily mean chronic infection.

      Authors response: Yes, we agree this was not the appropriate characterization and this was revised to ‘long-term’ infections.

      Page 14 line 306: Emergence of colistin resistance in a single patient, correct?

      Yes. This was further clarified in the text.

      Page 14 lines 315-320: This should go to the results section. In particular disinfection, closing, and replacing of tubing should be mentioned in the results section in reference to the results presented in Table S3.

      Thank you. We have considered this suggestion and have decided to leave this discussion as the closing paragraph of this publication. A follow-up study is ongoing to assess long-term efficacy of these interventions on the ST-621 bur also other outbreak clones at Facility A.

      Methods

      Page 15 lines 330-333: Perhaps it is possible to avoid redundancy.

      Thank you. We have revised the text accordingly.

      Page 15 lines 341: Information on which isolates were subjected to long-read sequencing is missing.

      Thank you. This has been added to the revised Table S1.

      Page 16 line 345: Was there a particular reason why Newbler was chosen?

      No. At the time Newbler was the default assembler built in the MRSN bacterial genome analysis pipeline and QC processes.

      Page 16, line 357-358: What was the rationale for selecting this isolate as reference genome?

      This isolate was chosen because it was collected early in the outbreak and phylogenetic analysis revealed it had low root to tip divergence.

      Page 16 line 361: Why 310 isolates, if only 253 were assigned to the outbreak clone and only a subset of those were collected in facility A?

      This was a typographical error that has corrected (it now reads “…set of 253 isolates.”) in the revised manuscript.  

      Page 17 lines 387-395: What is the reason that intra-patient diversity was not included in the set of criteria for SNP distances?

      The observed within host variability (now displayed in revised Fig. 1E) was taken into consideration when setting SNP thresholds for categorizing patient-to-patient transmission or environment-to-patient event. This is now clarified in the revised manuscript.

      Page 17 line 392: How was the threshold of <=10 SNPs determined?

      The 10 SNP cutoff to infer a patient-to-patient transmission event was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study, and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward within the same month) needed to be established.

      Page 17 line 395 and Figure 2: What was the assumed average mutation rate per genome per year?

      Thank you. The mean substitution rate inferred by BEAST was 2.987E-7 similar to estimate from previous studies on P. aeruginosa outbreaks (e.g. PMID: 24039595).

      Reviewer #3 (Recommendations For The Authors):

      Please find (line-by-line comments) on each section of the manuscript below:

      Introduction

      Line 86: I am wondering why the authors state ">28 facilities" instead of the exact number of facilities from which these lineages were recovered.

      Thank you. Manuscript was revised to provide the exact number of facilities. It now reads “…recovered from 37 and 28 facilities, respectively.”

      Methods

      It's not clear to me which criteria were used for collecting these isolates (both prospective and retrospective). I understand that some of the data are described in more detail in Lebreton et al but I did not find the specific criteria for the collection of the isolates and I imagine that these might differ if different facilities. Would it be possible to comment on that and add a short paragraph in the Methods section?

      Thank you. This lack of clarity was also raised by other reviewers, and we have revised the manuscript to indicate that: 1/MDR isolates only (Magiorakos et al PMID: 21793988) were collected from 2011-2020 with the same criteria for all facilities although compliance was variable through time and between facilities; and 2/ starting in 2021 all P. aeruginosa isolates, irrespective of their susceptibility profile, were collected from Facility A

      The data comes from a US Military hospital. Is this related to the US Veterans Affairs Healthcare system? Is there more detailed information about the demographics of the patient population?

      Facility A is part of the Military Health System (MHS) which provides care for active service members and their families. This is distinct from the US Veterans Affairs Healthcare system. Only limited patient data was accessible to us as this study was done as part of our public health surveillance activities. Patient age (avg. 57.2 +/- 21.0) and gender (ratio male/female 1.7) are provided in the revised manuscript. 

      Line 384ff: The origin of infection was inferred based on the SNP threshold and epidemiological links. However, recombination events can complicate the interpretation of SNP data. Have the authors attempted to account for this?

      Thank you. We agree that recombination events can complicate the interpretation of SNP data. We used Gubbins v2.3.1 to filter out recombination from the core SNP alignment, as indicated in the revised manuscript.

      The authors' definition of environment-to-patient transmission seems conservative (nearly identical strain and no known temporal overlap for > 365 days). Have the authors changed the threshold, performed sensitivity analyses, and tested how this would affect their results?

      Indeed, acknowledging that fixed thresholds have limitations in their ability to accurately predict the origin of infections, we took a conservative approach to favor specificity as our goal was simply to establish that cases of environment-to-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original predictions. This limitation is now discussed in the revised manuscript.

      The authors don't seem to incorporate the role of healthcare workers in the transmission process. Could they comment on this? I am assuming that environment-to-patient transmission could either be directly from the environment to the patient or via a healthcare worker. I think it's fine to make simplifying assumptions here but it would be great if this was explicitly described.

      Thank you for this suggestion. We have not sampled the hands of healthcare workers in this study. As a result, the reviewer is correct to say that we made the simplifying assumption that healthcare workers would be possible intermediates in either environment-topatient or patient-to-patient transmissions, as previously described by others (PMID: 8452949). This limitation is now discussed in the revised manuscript.

      Page 5, line 100: What does "all vs all" mean? Based on the supplement, I assume it's the pairwise distance and then averaged across all of those. It would improve the readability of the manuscript if the authors could briefly define this term and then maybe refer to Table S1.

      Thank you. We have created Fig.S2 and revised the manuscript to state that ST-621 isolates from facility A belonged to the same outbreak clone with a distance (averaged all vs all pairwise comparison) of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. S2, Table S1).

      Figure 1D: It would be interesting to see additional figures in the supplement on the percentage of sequenced isolates per year and whether it varies across the different sources/sites. Is there any information on which isolates were chosen for sequencing?

      Lack of clarity in the sampling/sequencing scheme was raised by multiple reviewers and we have provided a thorough response to earlier comments. We also have revised the material and methods section accordingly. Finally, we have created Fig. S3 to show the percentage of sequenced isolates per year across different sources/sites, as suggested by the reviewer. No noticeable patterns were observed. 

      It seems like only a subset of all clinical isolates were sequenced. Would it be possible that SC2 was present already earlier but not picked up until a certain date?

      Although all isolates received by the MRSN were sequenced, compliance varied through time so it is true that not all clinical isolates were sequenced between 2011-2019. As such, we fully agree with this hypothesis and discuss this possibility as BEAST analysis placed the origin of SC2 in 2004 while the first detection of an SC2 isolate was in December 2012. This limitation is now discussed in the revised manuscript.

      Could the authors elaborate on whether the isolates resulted from single-colony picks? Is it possible that the different absence of a subclone is due to the fact that they picked only a colony?

      Yes, the isolates resulted from single-colony picks except when the presence of different colony morphologies was noted. In the latter, representative isolates for each colony morphologies were processed. We have revised the methods to make that clear.

      Figure 2: It is difficult to see which nodes belong to which patient due to the small font size. I wonder if it was possible to color the nodes for each patient, to make it more readable.

      We tried coloring the nodes but with > 60 distinct patients/colors we decided it did not improve clarity. We have revised figure 2 to increase the font size.  

      Page 7-8, lines 154-155: Did the authors check whether there were isolates of the same strain (that were found in the environment) present in other patients elsewhere in the ward?

      Yes. In rare cases, we observed virtually genetically identical isolates from two patients collected in different wards. Because we only have access to clinical isolate data (collected from patient X in ward Y) and do not have access to patient data (admission/discharge date, wards, rooms, etc.), we do not know but cannot exclude that patients overlap in a room prior to the sampling of their P. aeruginosa isolates. We designed our fixed thresholds to be conservative. As a result, in this analysis, these cases are labelled as “undetermined”.  

      Page 8: Do the authors have any information on antibiotic use during this timeframe? From the discussion, it seems like there is no patient-level prescription data. Is there any data on overall trends? How were trends in antibiotic use correlated with trends in antibiotic resistance?

      Unfortunately, patient-level prescription data (or any other data not linked to the bacterial specimens) was not accessible to us as this study was done as part of our public health surveillance activities.

      To infer the origin of infection, the authors used a static method with fixed thresholds and definitions. This study does not provide any uncertainty with their estimates. Maybe the authors could add a sentence in the discussion section that MCMC methods to infer transmission trees incorporating WGS could provide these estimates. These methods have not been applied to PA a lot but two examples where MCMC methods have been used without WGS (though the definition of environmental contamination may differ between these studies and this study).

      https://doi.org/10.1186/s13756-022-01095-x

      https://doi.org/10.1371/journal.pcbi.1006697

      Thank you for this great suggestion. We have revised the manuscript to include a discussion on the limitations of fixed thresholds to infer transmission chains/origins, and to discuss existing alternatives including MCMC methods. 

      Line 322-323: This sentence is a bit vague since not all of these HAI are due to P. aeruginosa. I would suggest citing a number that is specific to PA.

      Thank you. While our paper shows a particular example of protracted P. aeruginosa outbreak, the roll-out of routine WGS surveillance in the clinic will help prevent hospital-associated drug-resistant infections for more than this species. We believe that broadening the scope in the last sentence of the manuscript is important and we decline to revise as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This report addresses a compelling topic. However, I have significant concerns, which necessitate a reassessment of the report's overall value.

      Anatomical Specificity and Stimulation Site:

      While the authors clarify that the ventral MGB (MGv) was the intended stimulation target, the electrode track (Fig. 1A) and viral spread (Fig. 2E) suggest possible involvement of the dorsal MGB (MGd) and broader area. Given that MGv-AI and MGd-AC pathways have distinct-and sometimes opposing-effects on plasticity, the reported LTP values (with unusually small standard deviations) raise concerns about the specificity of the findings. Additional anatomical verification would help resolve this issue.

      We thank the reviewer for highlighting the importance of anatomical specificity in MGv targeting. In the revised manuscript, we have taken several steps to address these issues:

      (1) Higher-magnification histology has been added to Figure 1A, clearly identifying the electrode tip localized within the MGv.

      (2) Figure 2E has been replaced with a new image showing viral expression largely confined to MGB, with minimal spread to surrounding structures.

      (3) In the Discussion, we explicitly acknowledge that although targeting was guided by stereotaxic coordinates and histological confirmation, some viral spread throughout the MGB occurred. We also discuss the possibility that both MGv-A1 and MGd-AC pathways may contribute to the recorded responses, which could influence the observed plasticity, as previously suggested by the reviewer.

      These additions and acknowledgments are now incorporated to ensure the reader can interpret the data with full consideration of anatomical targeting limitations.

      Results section:

      “Higher-magnification histology confirmed accurate MGv targeting (Figure 1A, lower-middle panel)’”

      Discussion section:

      “Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity (Lee and Sherman, 2010; Smith et al., 2012; Ohga et al., 2018; Lee, 2015; Hu, 2003). While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions”

      Lee, C.C., and Sherman, S.M. (2010). Topography and physiology of ascending streams in the auditory tectothalamic pathway. Proceedings of the National Academy of Sciences 107, 372-377. doi:10.1073/pnas.0907873107.

      Smith, P.H., Uhlrich, D.J., Manning, K.A., and Banks, M.I. (2012). Thalamocortical projections to rat auditory cortex from the ventral and dorsal divisions of the medial geniculate nucleus. Journal of Comparative Neurology 520, 34-51.

      Ohga, S., Tsukano, H., Horie, M., Terashima, H., Nishio, N., Kubota, Y., Takahashi, K., Hishida, R., Takebayashi, H., and Shibuki, K. (2018). Direct Relay Pathways from Lemniscal Auditory Thalamus to Secondary Auditory Field in Mice. Cerebral Cortex 28, 4424-4439. 10.1093/cercor/bhy234.

      Lee, C.C. (2015). Exploring functions for the non-lemniscal auditory thalamus. Frontiers in Neural Circuits 9, 69.

      Hu, B. (2003). Functional organization of lemniscal and nonlemniscal auditory thalamus. Experimental Brain Research 153, 543-549. 10.1007/s00221-003-1611-5.

      Figure legend section:

      “Post-hoc histology at higher magnification (lower-middle) shows the electrode tip confined within the MGv. White lines delineate the MGv/MGd border based on cytoarchitectonic landmarks.”

      Statistical Rigor and Data Variability:

      The remarkably low standard deviations in LTP measurements are unexpected based on established variability in thalamocortical plasticity. The authors' response confirms these values are accurate, but further justification, such as methodological controls or replication-would bolster confidence in these results. Additionally, the comparison of in vivo vs. in vitro LTP variability requires more substantive support.

      We appreciate the reviewer's concern regarding the unusually small variability. We would like to clarify that the error bars in our figures represent Standard Error of the Mean (SEM) rather than Standard Deviations (SD). As SEM is derived from the SD while incorporating sample size, it is inherently smaller than SD, which may have led to the impression of unrealistically low variability. This has now been explicitly clarified in the figure legends and Methods.

      To illustrate the raw variability, we have added Supplementary Figure S1E showing unaveraged fEPSP slopes compare to SEM, corresponding to Figure S1C. This addition ensures transparency and allows readers to directly assess the quality and consistency of our recordings.

      Regarding the comparison between in vivo and in vitro LTP variability:

      We agree that clarifying the basis of our in vivo vs. in vitro variability comparison is important. For example, in Chen et al., 2019, using identical LTP induction protocols (Fig. J), the SED of in vitro slice measurements (Fig. K) was substantially larger than that of in vivo recordings (Fig. L).

      This difference likely reflects:

      (1) In vitro: neighboring data points within a single experiment are highly correlated; variability across experiments is large due to heterogeneous sensitivity to LTP induction (10–200% increasement).

      (2) In vivo: lower correlation between neighboring data points, but each is averaged from 12 recordings over 2 min, reducing cross-trial variability; sensitivity to LTP induction is less variable across experiments (5–60% changes).

      We hope that these clarifications and additional data address the reviewer’s concerns regarding statistical rigor and data variability.

      Methods section:

      “The slopes of the evoked fEPSPs were calculated and normalized using a customized MATLAB script, and the group data were plotted as mean ± Standard Error of the Mean (SEM).”

      “All data are presented as mean ± SEM. Error bars and shaded areas represent SEM. Here, n represents the number of stimulation-recording sites or and N represents the number of animals in each experiment. At each time point, fEPSPs were averaged across 12 consecutive trials (2 min) to reduce within-experiment fluctuation. Normalized time courses were then used for repeated-measures analyses.”

      Figure legend section:

      “Data are mean ± SEM; error bars indicate SEM.”

      “(E) Unaveraged fEPSP slopes are shown for each time point, with individual data points corresponding to all sites included in Fig. 1C; mean ± SEM overlays are shown in black. Note that all individual data points are displayed in this figure, whereas in Figure S1C, only the averaged values are shown.”

      Viral Targeting and Specificity:

      The manuscript does not clearly address whether cortical neurons were inadvertently infected by AAV9. Given the potential for off-target effects, explicit confirmation (e.g., microphotograph of stimulation site) would strengthen the study's conclusions.

      We appreciate the request for quantitative confirmation of off-target cortical infection. We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, Supplementary Figure S1E). Although these observations do not constitute a formal statistical estimate, they were consistent across sampled sections and are in line with the low-level trans-synaptic transfer reported for AAV9. We have discussed their potential implications for data interpretation in the Discussion.

      We hope these clarifications and the newly presented histological evidence address the reviewer’s concerns and further strengthen the rigor of our study.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      “Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm.”

      Integration of Prior Literature:

      The discussion of existing work is adequate but could be more comprehensive. A deeper engagement with contrasting findings would provide better context for the study's contributions.

      We appreciate the reviewer’s suggestion to engage more deeply with contrasting findings. In the revised Introduction and Discussion, we have:

      (1) Refocused the historical context toward adult auditory thalamocortical plasticity and explicitly contrasted it with visual and somatosensory cortices, while adult ACx exhibits weaker and more gated NMDAR dependence.

      (2) Positioned CCK–CCKBR signaling as a permissive/gating mechanism that can complement or partially compensate for postsynaptic NMDAR signaling, potentially reconciling variability across cortical areas and life stages.

      (3) Clarified the potential differential contributions of lemniscal (MGv) and non‑lemniscal (MGd) streams to plasticity expression and variability, acknowledging pathway-specific response properties.

      These additions are now integrated in the Introduction (paragraphs 2–3) and Discussion (sections “CCK Dependence of Thalamocortical Neuroplasticity in the ACx” and “Developmental and Age‑Dependent CCK‑Mediated Plasticity”), providing a more comprehensive and balanced context for our findings.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Discussion section:

      “These findings suggest a potential involvement of CCK in thalamocortical plasticity. Our data extend this framework by identifying CCK–CCKBR signaling as a permissive modulator of adult thalamocortical LTP.”

      “We propose that CCKBR activation may trigger intracellular calcium release and AMPAR recruitment in parallel to, or partially compensating for,independently of postsynaptic NMDAR signaling, while the complementarity of CCKBR and NMDARs may contribute to robust thalamocortical plasticity. This complementary arrangement may reconcile differences across developmental stages and cortical areas, and highlights neuropeptidergic signaling as a lever to re-enable adult thalamocortical plasticity.

      Notably, exogenous CCK alone failed to induce LTP in the absence of accompanying stimulation (Figure S2A and S2B), emphasizing that CCK function as a modulator rather than a direct initiator of LTP. Activation of the thalamocortical pathway is also essential for LTP induction. Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity. While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions. Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      Therapeutic Implications:

      The authors' discussion of therapeutic potential is now appropriately cautious and well-reasoned.

      Conclusion:

      While the study presents intriguing findings, the concerns outlined above must be addressed to fully establish the validity and impact of the results. I appreciate the authors' efforts thus far and hope they can provide additional data or clarification to resolve these issues. With these revisions, the manuscript could make a valuable contribution to the field.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because is opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      There are some details that should be addressed, primarily regarding potential baseline differences in comparison groups. The behavioral assessment is relatively limited, but may be fleshed out in future work.

      We appreciate the reviewer’s suggestion regarding potential baseline differences. In our study, all groups underwent harmonized procedures, including identical exposure, timing, and acquisition parameters. Group allocation and data collection were performed under standardized conditions. For electrophysiology, baseline fEPSP measures and stimulation intensities were calibrated per site using consistent input-output procedures, with analyses based on normalized slopes relative to each site’s own baseline. For behavior, animals from the same litter served as both experimental and control groups, matched for handling conditions; startle/PPI data were acquired using identical hardware and timing settings. While no additional post hoc re-processing was performed, we have clarified these controls in the Methods to enhance transparency.

      We agree that the behavioral assessment is intentionally focused and does not encompass broader auditory perceptual functions (e.g., temporal processing). We now explicitly state this limitation and propose future studies to examine temporal acuity and cell-type-specific manipulations. These experiments will clarify how CCK-dependent thalamocortical plasticity generalizes to other perceptual domains.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity is almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results, along with the rigor multi-angled approach, provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      We agree with the reviewer that the relative contributions of pyramidal neurons and PV-interneurons to CCK-dependent thalamocortical plasticity remain to be determined. Our recordings primarily reflected excitatory postsynaptic activity from layer IV pyramidal neurons, given the fEPSP metrics used. As PV-interneurons are essential in shaping cortical inhibition and temporal precision, they may also be modulated by CCK release from thalamocortical inputs. We have explicitly acknowledged this limitation in the Discussion section of the manuscript and propose that future studies should employ cell-type-specific recording or manipulation approaches to dissect the respective roles of inhibitory and excitatory neuronal populations in CCK-dependent thalamocortical plasticity. We appreciate the reviewer’s suggestion and believe this is a valuable direction for ongoing research.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      We acknowledge that the current study primarily examined frequency discrimination and did not directly assess temporal processing. Enhanced network responsivity could have variable effects on temporal precision, depending on the balance between excitation and inhibition. PV-interneurons, in particular, are known to support temporal fidelity in auditory processing (Nocon et al., 2023; Cai et al., 2018). We discussion that future work should investigate how CCK modulation influences temporal coding at both the circuit and single-cell level, and whether such changes align with or diverge from the mechanisms underlying frequency discrimination improvements.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      We appreciate the reviewer’s suggestion. While we recorded single-unit activity during HFS protocols, long-term stability over >1.5 hours was less consistent compared to fEPSP measurements, leading to higher variability in spike-based metrics. We therefore used fEPSPs as our primary quantitative measure for robustness. We agree, however, that single-neuron data could yield valuable complementary insights. In future experiments combining stable single-unit recording with synaptic measurements will be conducted to better link cellular excitability and network plasticity.

      (4) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      We agree that PPI involves multiple cortical and subcortical nodes. In our paradigm, layer IV neurons receive segregated MGv inputs, high-frequency activation of thalamocortical projections induces robust synaptic plasticity in layer IV. The potentiation at these synapses could amplify the cortical representation of weak prepulses, facilitating their detection and enhancing PPI performance. This interpretation is consistent with prior work showing that local CCK infusion combined with auditory stimuli can augment cortical responses (Li et al., 2014). We have expanded the Discussion to highlight that in aged animals, where baseline PPI performance is often reduced due to degraded auditory inputs (Ouagazzal et al., 2006; Young et al., 2010), restoring thalamocortical plasticity via CCK may partially compensate for sensory gating deficits. We further note that the exact contribution of layer IV to PPI circuitry warrants future investigation using pathway-specific perturbations.

      Comments on revisions:

      The manuscript is much improved and many of the issues or questions have been addressed. Ideally, evidence for the degree of transsynaptic spread for AAV9-Syn-ChrimsonR-tdTomato would also be provided in some form since in the authors' response in sounds like some was observed, as expected.

      We thank the reviewer for this important point and for the opportunity to clarify. As requested, we have carefully examined the possibility of transsynaptic spread in our experiments:

      We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, see Figure 2A and Figure S1F), consistent with occasional low-level transsynaptic spread reported in the literature.

      We have updated the Discussion sections to clearly report these findings, and to emphasize the potential for vector- and construct-dependent variability in transsynaptic spread. We also explicitly acknowledge this technical limitation and discuss its implications for data interpretation.

      We hope these clarifications and additions address the reviewer’s concern regarding viral specificity and transsynaptic spread.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      " Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm."

      Reviewer #1 (Recommendations for the authors):

      Thank you for your efforts in revising the manuscript. While progress has been made, I have a few remaining concerns that I hope you can address to further strengthen the study.

      Focus of the Introduction:

      Auditory thalamocortical plasticity is known to be NMDA-dependent, albeit with weaker dependence during early development. Given that this work examines thalamocortical LTP in young adult and aged mice, I recommend refining the Introduction to place greater emphasis on auditory thalamocortical plasticity in the adult brain. The current discussion of somatosensory plasticity during early development, while interesting, seems less directly relevant to the present study. A sharper focus on the auditory system would better frame your research questions.

      We thank the reviewer for this constructive suggestion. We have revised the Introduction to emphasize adult auditory thalamocortical plasticity and to streamline content less directly related to our study. Specifically:

      (1) We now foreground evidence that thalamocortical inputs retain experience-dependent plasticity beyond the critical period in adult ACx, including neuromodulatory pairing, HFS-induced LTP, and experience-dependent reinstatement.

      (2) We explicitly note that adult auditory thalamocortical plasticity is more weakly NMDAR-dependent than in other cortices, thereby motivating our focus on CCK–CCKBR signaling as a permissive mechanism for adult LTP.

      (3) We have condensed the discussion of somatosensory plasticity during early development to a brief background and shifted the focus to adult auditory mechanisms and knowledge gaps that directly frame our research questions.

      These changes appear in the revised Introduction (paragraphs 2–3), which now provide a sharper rationale for investigating CCK‑dependent thalamocortical LTP in young adult and aged mice.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Anatomical Specificity of MGv Targeting:

      The mouse MGv is a small and deep structure, and precise targeting is critical given the functional differences between MGv and MGd pathways. In the current figures:

      Fig. 1A suggests the electrode track may have approached the MGd.

      Fig. 2E indicates some viral spread beyond the MGB.

      Since MGv-AI and MGd-AC pathways exhibit distinct (and sometimes opposing) effects on plasticity, I encourage you to provide additional clarification or verification of the stimulated/infected regions. This would greatly enhance the interpretability of your LTP data.

      Please see above.

      Data Variability and Transparency:

      The reported thalamocortical LTP values exhibit remarkably small standard deviations, which is somewhat unexpected given typical experimental variability in such measurements. To address this concern, it would be helpful to include example raw traces of the recorded LTP (e.g., in a supplementary figure). This would allow readers to better evaluate the data quality and consistency.

      Please see above.

      Reviewer #2 (Recommendations for the authors):

      Overall, the authors did an excellent job of responding to our critiques, both in their direct responses and in the modified text. The modified text is also more readable than before. Two issues that the authors should consider addressing;

      (1) Unless I missed it, there is no commentary stated about the impact of using aged C57 mice, which lose their hearing, such that the effects seen in the older mice could be related to hearing loss rather than aging alone. Some discussion of this point should be made.

      We thank the reviewer for raising this important point. C57BL/6 mice are known to develop age-related hearing loss, which could potentially affect PPI performance in older animals. We note that in our internal screening we observed markedly reduced startle amplitudes and frequent negative PPI values in many mice >20 months, indicating severe auditory impairment. To minimize this confound a priori, we excluded mice older than 20 months and restricted the aged cohort to 17–19 months, which consistently exhibited robust startle responses and reliable PPI. While some degree of presbycusis may still be present in this age range in C57BL/6 mice, the improvement of PPI following CCK administration combined with acoustic exposure indicates that the auditory pathways remained sufficiently functional to support sensorimotor gating. In fact, the presence of partial hearing loss in these aged mice may have allowed us to better detect the beneficial effects of CCK, further highlighting its therapeutic potential for age-related deficits. The greater improvement in PPI observed in older mice —as compared to younger mice, whose PPI in control group is already high—likely reflect the combined effects of age-related hearing loss and CCK deficiency, with CCK-induced restoration of thalamocortical plasticity being the primary focus of our study. We have now added a discussion of this point in the revised manuscript.

      Discussion section:

      “In aged mice, PPI deficits are commonly observed due to impaired auditory processing. Notably, C57BL/6 mice exhibit age-related hearing loss (Johnson et al., 1997). Both age-associated changes in auditory function and CCK deficiency contribute to impaired sensory gating. The presence of partial hearing loss in aged mice may have facilitated the detection of CCK’s beneficial effects, further highlighting its therapeutic potential for age-related deficits. Our results suggest that enhanced thalamocortical plasticity mediated by CCK might partially compensate for these deficits by amplifying residual auditory signals in aged mice.”

      Johnson, K.R., Erway, L.C., Cook, S.A., Willott, J.F., and Zheng, Q.Y. (1997). A major gene affecting age-related hearing loss in C57BL/6J mice. Hearing Research 114, 83-92. https://doi.org/10.1016/S0378-5955(97)00155-X.

      (2) Minor point - I do not agree with the use of the term "ventral to bregma" to describe where the craniotomies were placed (e.g., line 599). The direction being described is more typically referred to as "lateral." If the authors prefer to use the term "ventral," perhaps additional clarification can be added.

      We thank the reviewer for pointing out this issue and apologize for any confusion. We agree that “ventral to bregma” is not the standard terminology and have revised the Methods section to use “below the temporal ridge”. We have also clarified that the craniotomy for accessing the auditory cortex was performed on the lateral aspect of the skull in rodents, just below the temporal ridge. We hope this revision resolves the ambiguity.

      Method section:

      “A craniotomy was performed over the temporal bone, as the auditory cortex is located on the lateral surface of the brain (coordinates: 1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma for mice; 2.5 to 6.5 mm below the temporal ridge and 3.0 to 5.0 mm posterior to bregma for rats) to access the auditory cortex.”

      “Six-week after CCK-sensor virus injection, a craniotomy was performed to access the auditory cortex at the temporal bone (1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma), and the dura mater was opened.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

      We will expand the Discussion section to highlight the temporal mechanisms between different nervous systems.

      Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising

      ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      We do plan to conduct multi-omics experiments to provide a more comprehensive assessment of neuronal identity upon loss-of-function of E93. However, omics results will be summarized in a new manuscript, but not for the revised manuscript.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

      We will describe and interpret this part of results in the main text in a more careful manner.

    1. Author response:

      Reviewer 1:

      We appreciate the reviewer’s positive assessment and in revision will expand the Discussion to clarify some of the mechanistic insights of this work, as well as to include expanded treatment of related studies in other model systems.

      Reviewer 2:

      We are grateful for the reviewer’s thorough and supportive comments. We will carefully revise assertions and conclusions for objectivity. Additional analysis of the Zelda experiments will be performed and experimental data tables will be updated to report these results. For the point about providing “insight into models explaining why H3K27me3 is absent prior to NC14,” we have recently submitted a related preprint that addresses this issue directly (Degen, Gonzaga-Saavedra, and Blythe, bioRxiv 2025). In summary, we find evidence that a maternal PcG imprint is indeed maintained through cleavage divisions, albeit through lower-order methylation states (maximally H3K27me2). We chose not to include these additional results in this manuscript to maintain the focus of this study on ZGA. Our revision of this manuscript will include a section in the Discussion that synthesizes the conclusions of the two studies.

      Reviewer 3:

      We thank the reviewer for recognizing the strength of our data and conclusions, and we agree that our results help settle conflicting claims in the field. We will emphasize Zelda’s context-dependent effects more clearly in the revised manuscript.

      References:

      Degen EA, Gonzaga-Saavedra N, Blythe SA. Lower-order methylation states underlie the maintenance and re-establishment of Polycomb modifications in Drosophila embryogenesis. bioRxiv [Preprint]. 2025 Jul 29:2025.07.25.666882. doi: 10.1101/2025.07.25.666882. PMID: 40766521; PMCID: PMC12324246.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We have further analyzed the mutations on the available PHD2 crystal structures in complex with HIFα to discern how these substitution mutations may impact PHD2 structure and function.  This analysis has been added into the discussion.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed. We have performed an additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph were changed for clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  A brief paragraph discussing this has been included in the discussion.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had K[d] of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. 

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K[d]’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We have expanded on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods. 

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data has been added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we have corrected and clarified our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data has been added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells have been added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We have performed an additional experiment as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.  Notably, our conclusion remained unchanged.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants has been added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2.  The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations.  The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter.  Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants.  The limitations of the luciferase assay have been expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif.  While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD.  We have discussed this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study.  We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog.  However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

      Reviewer #1 (Recommendations for the authors): 

      (1) To increase the impact and significance of this work, I would recommend determining the mechanism by which A228S and F366L impair PHD2. Are these mutations affecting interactions with proteins other than HIF1a? Furthermore, does the F366L mutation affect the hydroxylation rate? This should be measured. The authors should also perform a more in-depth structural analysis of these mutations and perhaps use AlphaFold to identify how these sites may be involved in other interactions. 

      We thank the reviewer for the recommendations.  A paragraph discussing the quandary of A228S and F366L has been added to the discussion as well as an in-depth structural analysis of each selected mutant.  While AlphaFold is excellent at predicting protein structures overall, its capability to predict the effect of single point mutation, such as those in this study, is limited.  Therefore, it was not utilized for this paper.

      (2) For the aggregation assay, I recommended injecting the same quantity of protein on the SEC. If the aggregation-prone mutants' yields were too low, then reduced amounts of the other mutants should be injected. 

      Agreed.  An additional experiment was performed in which similar concentrations of each mutant protein was loaded onto the SEC column and chromatograms was normalized according to the molecular concentration.  Results from this experiment have been added to replace the previously performed aggregation assay.  Notably, the data from the revised experiment did not change the outcome or conclusion of the study.

      (3) For the NMR kinetics data, the authors should discuss the impact of affinities and concentrations on the reaction rate and incorporate this analysis framework to interpret their data. 

      Done.  As discussed in depth in response to Public Reviewer 1’s fourth comment, we observed only a subtle reduction in hydroxylation efficiency of HIF1aCODD by PHD2 P317R in comparison to PHD2 WT.  Upon performing BLI, we found PHD2 P317R displays only a mild binding defect on the CODD and NODD.  The WT-like binding to the NODD by PHD2 P317R appears to be inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.   These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation.

      Reviewer #2 (Recommendations for the authors): 

      It is unclear where the source data came from describing the patient mutations, or if it is publicly available. Several minor issues were noted with several of the figures or methods: 

      (1) Figure 2C. It is not clear what data are being compared for significance. The lines don't seem to clearly distinguish this. 

      Done.  The significance lines have been adjusted in the figure to better convey which data are being compared.

      (2) Please incorporate the calculated biophysical constants (KD, TM, etc, average +/- std dev) from the tables into the figures or figure legends that show the data from which they are calculated.  

      Done.  References to the corresponding tables have been added to the appropriate figure legends.

      (3) Figure 3C, the data for F366L do not appear normalized in the same way as the other constructs. 

      CD melt values for F366L were normalized in the same way as other constructs but due to noisier data acquired between 25-37°C, the top value of the sigmoidal curve is slightly higher than the other constructs (F366L: 1.066, WT: 1.007, A228S: 1.000, P317R: 1.015, R371H: 1.005). 

      (4) For Figure 1B, it would be helpful to highlight the mutants characterized in the current study with a different color/symbol to help show the number of cases. 

      Done.  Dots representing the selected mutants have been highlighted in red in Figure 1B.

      (5) A description of the isotopic labeling of PHD2 is missing from the methods.

      Due to the nature of the NMR assay, no isotopic labeling was required for PHD2.

      Reviewer #3 (Recommendations for the authors): 

      (1) To further strengthen the manuscript, the authors could consider exploring the relevance of their in vitro findings in a more physiological context. 

      We thank the reviewer for the suggestion, and we will certainly consider furthering our investigation in a more physiological context for future studies.

      (2) If technically feasible, integrating direct analyses of HIF2α regulation by the PHD2 mutants would better reflect the clinical phenotype, given the known importance of HIF2α in erythrocytosis. 

      We agree that HIF2α is important in the context of erythrocytosis, but through MST we observed no difference in binding pattern between HIF1 and HIF2 and the selected PHD2 mutants.  As we had previously assigned >95% of residues for HIF1α ODD for NMR assay, we analyzed HIF1 with the supposition that any defects observed would likely apply to HIF2.  However, we agree that future studies on the impact of PHD2 mutants directly on HIF2 would be beneficial to supplement our understanding of pseudohypoxic disease.

      (3) Additionally, although perhaps more suitable for future work or discussion, structural modeling or highresolution structural studies of the P317R variant could offer valuable insight into the observed NODD selectivity defect. 

      We thank the reviewer for the suggestion. While solving the structure of PHD2 P317R in complex with NODD is beyond the scope of this manuscript, a crystal structure of PHD2 P317R in complex with an inhibitor has been solved and insights from this structure have been added to the discussion. 

      (4) Finally, a brief clarification or discussion of the limitations of the luciferase reporter assay-especially in the context of aggregation-prone mutants-would help readers better interpret the functional data. 

      We thank the reviewer for the suggestion.  The limitations of the luciferase reporter assay in regard to its inability to detect defects with aggregation-prone mutants have been elaborated on in the discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to kill ovarian and triple negative cancer cell lines in vitro and in vivo using preclinical mouse models.

      Strengths and weaknesses

      The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.

      They identify the DNA damage protein ERCC1 to be reduced in expression with PRMT inhibitors. As ERCC1 is known to be synthetic lethal with PARPi, this provides a mechanism for the synergy. They use cell lines only for their study in 2D as well as xenograph models.

      We sincerely thank Reviewer #2 for the insightful and constructive feedback, as well as for the kind recognition of the scientific quality of our work: “The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.” We sincerely thank Reviewer #2 for their thoughtful and constructive comments during both rounds of review, which have significantly improved the quality of our manuscript. In response, we have incorporated new results from additional experiments into the figures (Figures 6M and 6N) and made comprehensive revisions throughout the text, figures, and supplementary materials. Following the reviewer’s valuable suggestions, we also revised the Discussion section. In the “Recommendations for the authors” sections, we have provided detailed point-by-point responses to each comment, which were instrumental in guiding our revisions. We believe these updates have substantially strengthened the manuscript and fully addressed all reviewer concerns.

      Reviewer #2 (Recommendations for the authors): 

      Although the authors have addressed each recommendation from the reviewer, further revision of the manuscript are still necessary, as outlined below.

      Add these additional comments in the text to further enhance the comprehension and clarity of the data.

      (1) If the authors kept the tumors of various sizes in Figure 7I, it would be important to assess the protein and/or mRNA level of ERCC1 to further support their mechanism.

      Question (1): Please add the figures of new experiments (treatment diagram, curves for tumor volume and qRT-PCR data) to Figure 6.

      We thank the reviewers for their constructive suggestions. In response to the reviewers’ comments, we have added the treatment diagram and qPCR results to Figure 6. In this experiment, we shortened the treatment duration to seven days to assess early molecular responses to therapy rather than downstream effects. As expected, such short-term treatment did not result in significant differences in tumor growth among groups. The new results are now presented in Figure 6, panels M and N. The corresponding results and figure legends will also be included in the revised version of the manuscript

      (2) Figure 2G: please explain why two bands remain for sgPRMT1.

      Question (2): In the answer, the authors stated, "Upon knockdown of the major isoforms by CRISPR/Cas9, expression of this minor isoform may have increased as part of a compensatory feedback mechanism, rendering it detectable by immunoblotting." Please put the statement into the discussion section.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (3) (Previously point 5) What is the link with ERCC1 splicing because reduced overall ERCC1 expression is clear?

      Question (5): Please add the explanation you provide of links between ERCC1 splicing and PRMTi into the discussion section.

      "Furthermore, as shown in Figure 4G, we observed a reduction in the total ERCC1 mRNA reads following PRMTi treatment. This decrease may be attributed, at least in part, to the instability of the alternatively spliced ERCC1 transcripts, which could be more prone to degradation. In combination with the transcriptional downregulation of ERCC1 induced by PRMT inhibition, these alternative splicing events may lead to a further reduction in functional ERCC1 protein levels. This dual impact on ERCC1 expression, through both decreased transcription and the generation of unstable or nonfunctional isoforms, likely contributes to the enhanced cellular sensitivity to PARP inhibitors observed in our study."

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (4) (Previously 6) Figure 7J: From the graph, it seems like Olaparib+G715 and G715+G025 have a similar effect on tumor volume (two curves overlap). Please discuss.

      Question (6): In the answer, the authors stated, "Our in vitro and in vivo findings, together with previously published data, consistently demonstrate that GSK715 is more potent than both GSK025 and Olaparib. Notably, treatment with GSK715 alone led to significantly greater inhibition of tumor growth compared to either GSK025 or Olaparib administered individually. This higher potency of GSK715 also explains the comparable levels of tumor suppression observed in the combination groups, including GSK715 plus Olaparib and GSK715 plus GSK025. These results suggest that GSK715 is likely the primary driver of efficacy in the two drug combination settings." Please put the statement in the corresponding result section for Figure 6J.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the result section for Figure 6J to provide greater clarity and context for our findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage. 

      Strengths: 

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance. 

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and understand better the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics. 

      We greatly appreciate the positive feedback regarding the design of our study and the evolutionary significance of our findings.

      Weaknesses: 

      My concerns pertain to the three following areas of the manuscript: 

      (1) In-silico definition of the anemone matrisome using sequence analysis: 

      a) While a similar computational pipeline has been applied to predict the matrisome of several model organisms, the authors fail to provide a comprehensive definition of the anemone matrisome: In the text, the authors state the anemone matrisome is composed of "551 proteins, constituting approximately 3% of its proteome (see page 6, line 14), but Figure 1 lists 829 entries as part of the "curated" matrisome, Supplementary Table S1 lists the same 829 entries and the authors state that "Here, we identified 829 ECM proteins that comprise the matrisome of the sea anemone Nematostella vectensis" (see page 17, line 10). Is the sea anemone matrisome composed of 551 or 829 genes? If we refer to the text, the additional 278 entries should not be considered as part of the matrisome, but what is confusing is that some are listed as glycoproteins and the "new_manual_annotation" proposed by the authors and that refer to the protein domains found in these additional proteins suggest that in fact, some could or should be classified as matrisome proteins. For example, shouldn't the two lectins encoded by NV2.3951 and NV2.3157 be classified as matrisome-affiliated proteins? Based on what has been done for other model organisms, receptors have typically been excluded from the "matrisome" but included as part of the "adhesome" for consistency with previously published matrisome; the reviewer is left wondering whether the components classified as "Other" / "Receptor" should not be excluded from the matrisome and moved to a separate "adhesome" list. 

      In addition to receptors, the authors identify nearly 70 glycoproteins classified as "Other". Here, does other mean "non-matrisome" or "another matrisome division" that is not core or associated? If the latter, could the authors try to propose a unifying term for these proteins? Unfortunately, since the authors do not provide the reasons for excluding these entries from the bona fide matrisome (list of excluding domains present, localization data), the reader is left wondering how to treat these entries. 

      Overall, the study would gain in strength if the authors could be more definitive and, if needed, even propose novel additional matrisome annotations to include the components for now listed as "Other" (as was done, for example, for the Drosophila or C. elegans matrisomes). 

      The reviewer is correct to point out the confusing terminology used throughout our manuscript, where both the total of 829 proteins constituting the curated list of ECM domain proteins and the actual matrisome (excluding "others") were referred to as "matrisomes". In general, we followed the example set by Naba & Hynes in their 2012 paper (Mol Cell Proteomics. 2012 Apr;11(4):M111.014647. doi: 10.1074/mcp.M111.014647), where they define the "matrisome" as encompassing all components of the extracellular matrix ("core matrisome") and those associated with it ("matrisome-associated" proteins). This corresponds to our group of 551 proteins, comprising both core matrisome and matrisomeassociated proteins. The Naba & Hynes paper also contains the inclusive and exclusive domain lists for the matrisome that we applied for our dataset. In the revised manuscript, we have now labelled the group of 829 proteins as "curated ECM domain proteins/genes", which includes all proteins positively selected for containing a bona fide ECM domain. After excluding non-matrisomal proteins such as receptors, we arrive at the 551 proteins that constitute the "Nematostella matrisome". We have maintained this terminology throughout the revised manuscript and have revised Figures 1B and 4B accordingly.

      Regarding the category of "other" proteins, which by definition are not part of the matrisome although containing ECM domains, we have taken the reviewer's advice and classified these in more detail. We categorized all receptors as "adhesome" (202 proteins).  The remaining group of “other” secreted ECM domain proteins were then further subcategorized. Those exhibiting significant matches in the ToxProt database were subclassified as "putative venoms" (15 proteins). This group also includes the two lectins (NV2.3951 and NV2.3157), which had been originally shifted to the “other” category due to their classification as venoms. We categorized as “adhesive proteins” (28 proteins) factors such as coadhesins that due to their domain architecture resemble bioadhesive proteins described in proteomic studies of other invertebrate species, such as corals or sponges (see also https://doi.org/10.1016/j.jprot.2022.104506). Further sub-categories are stress/injury response proteins (9 proteins) and ion channels (6 proteins). The remaining 17 proteins were categorized as “uncharacterized ECM domain proteins”. These include highly diverse proteins possessing either single ECM domains or novel domain combinations. We decided to retain those in our dataset as candidates for future functional characterization.

      b) It is surprising that the authors are not providing the full currently accepted protein names to the entries listed in Supplementary Table S1 and have used instead "new_manual_annotation" that resembles formal protein names. This liberty is misleading. In fact, the "new_manual_annotation" seems biased toward describing the reason the proteins were positively screened for through sequence analysis, but many are misleading because there is, in fact, more known about them, including evidence that they are not ECM proteins. The authors should at least provide the current protein names in addition to their "new_manual_annotations". 

      c) To truly serve as a resource, the Table should provide links to each gene entry in the Stowers Institute for Medical Research genome database used and some sort of versioning (this could be added to columns A, B, or D). Such enhancements would facilitate the assessment of the rigor of the list beyond the manual QC of just a few entries. 

      d) Since UniProt is the reference protein knowledge database, providing the UniProt IDs associated with the predicted matrisome entries would also be helpful, giving easy access to information on protein domains, protein structures, orthology information, etc. 

      e) In conclusion, at present, the study only provides a preliminary draft that should be more rigorously curated and enriched with more comprehensive and authoritative annotations if the authors aspire the list to become the reference anemone matrisome and serve the community. 

      Table S1 has been updated to include links to the respective Stowers Institute IDs (first two columns), as well as SwissProt IDs and current descriptions from both the Stowers Institute (SI) and Swissprot.

      In our manual annotations, we prioritized these over automated ones due to the considerable effort invested in examining each sequence individually. The cnidaria-specific minicollagens and NOWA proteins might serve as an example. According to the SI descriptions, the minicollagens are annotated as “keratin-associated protein, predicted or hypothetical protein, collagen-like protein and pericardin”. We classified these as minicollagens on the basis of overall domain architecture and of signature domains and sequence motifs, such as minicollagen cysteine-rich domains (CRDs) and polyproline stretches (doi: 10.1016/j.tig.2008.07.001). NOWA is a CTLD/CRD-containing protein that is part of nematocyst tubules (doi:10.1016/j.isci.2023.106291). The first two NOWA isoforms, according to Si descriptions, were annotated as aggrecan and brevican core proteins, which is very misleading. We therefore feel that our manual annotations better serve the cnidarian research community in classifying these proteins.

      Automated annotations of ECM proteins often rely on similarities between individual domains, neglecting overall domain composition. For example, Swissprot descriptions annotate 31 TSP1 domain-containing proteins in our list as "Hemicentin-1", but closer inspection reveals that only one sequence (NV2.24790) qualifies as Hemicentin-1 due to its characteristic vWFA, Ig-like, TSP1, G2 nidogen, and EGF-like domain architecture. Regarding novel protein annotations, NV2.650 might serve as an example. While SI descriptions annotate this protein as "epidermal growth factor" based on the presence of several EGF-like domains, our analysis reveals two integrin alpha N-terminal domains that classify this sequence as integrin-related. We have therefore assigned a description (Secreted integrin-N-related protein) that references this defining domain and avoids misclassification within the EGF family.

      In cases where the automated annotation (including those in Genbank) matched our own findings, we adopted the existing description, as seen with netrin-1 (NV2.7734). We acknowledge that our manual annotations are not flawless and will be refined by future research. Nonetheless, we offer them as an approximation to a more accurate definition of the identified protein list.

      (2) Proteomic analysis of the composition of the mesoglea during the sea anemone life cycle: 

      a) The product of 287 of the 829 genes proposed to encode matrisome components was detected by proteomics. What about the other ~550 matrisome genes? When and where are they expressed? The wording employed by the authors (see line 11, page 13) implies that only these 287 components are "validated" matrisome components. Is that to say that the other ~550 predicted genes do not encode components of the ECM? This should be discussed. 

      Obviously, our wording was not sufficiently accurate here. In the revised Fig. 1B we indicated that 210 of the 551 matrisome (core and associated) proteins were confirmed by mass spectrometry. In total, 287 proteins were identified by mass spectrometry, meaning that 77 of those are non-matrisomal proteins belonging to the “adhesome” (47) and “other” (30) groups. The fact that the remaining 542 proteins of the matrisome predicted by our in silico analysis could not be identified has two major reasons: (1) Our study was focussed on the molecular dynamics of the mesoglea. Therefore, only mesogleas were isolated for the mass spectrometry analysis and nematocysts were mostly excluded by extensive washing steps. As nematocysts contribute significantly to the predicted matrisome, this group of proteins is underrepresented in the mass spectrometry analysis. (2) A significant fraction of the predicted ECM proteins constitutes soluble factors and transmembrane receptors. These might not be necessarily part of the mesoglea isolates. In addition, the isolation and solubilization method we applied might have technical limitations. Although we used harsh conditions for solubilizing the mesoglea samples (90°C and high DTT concentrations), we cannot exclude that we missed proteins which resisted solubilization and thus trypsinization. We confirmed that all genes predicted by the in silico analysis have transcriptomic profiles as demonstrated in supplementary table S4. We have clarified these points in the revised results part (p.6) and also revised the statement in line 16, page 13.

      b) Can the authors comment on how they have treated zero TMT values or proteins for which a TMT ratio could not be calculated because unique to one life stage, for example? 

      We did not include these proteins in the analysis of the respective statistical comparison. This involved only very few proteins (about 10).  

      c) Could the authors provide a plot showing the distribution of protein abundances for each matrisome category in the main figure 4? In mammals, the bulk of the ECM is composed of collagens, followed by fibrillar ECM glycoproteins, the other matrisome components being more minor. Is a similar distribution observed in the sea anemone mesoglea? 

      We have included such a plot showing protein abundances across life stages and protein categories (Fig. 4A). Collagens and basement membrane proteoglycans (perlecan) are the most abundant protein categories in the core matrisome while secreted factors dominate in the matrisome-associated group.

      d) Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. The authors may want to reanalyze their dataset and include these PTMs as part of their search criteria to ensure capturing all collagen-derived peptides. 

      Thank you for this suggestion. We have re-analyzed our dataset including lysine and proline hydroxylation as PTM. While we obtained in total 70 more proteins using this approach, this additional group did not contain any large collagen or minicollagen we had not detected before. We only obtained two additional collagen-like proteins with very short triple helical domains (V2t013973001.1, NV2t024002001.1), one being a fragment. We don’t feel this justifies implementing a re-analysis of the proteome in our study.

      e) The authors should ensure that reviewers are provided with access to the private PRIDE repository so the data deposited can also be evaluated. They should also ensure that sufficient meta-data is provided using the SRDF format to allow the re-use of their LCMS/MS datasets. 

      We apologize for not providing the reviewer access in our initial submission and have asked the editorial office to forward the PRIDE repository link to all reviewers immediately after receiving the reviews. We did upload a metadata.csv file with the proteomics dataset. This file contains an annotation of all TMT labels to the samples and conditions and replicates used in the manuscript. It contains similar information as an SRDF format file. In addition, the search output files on protein and psm level have been provided. So, from our point of view, we provided all necessary information to reproduce the analysis.

      (3) Supplementary tables: 

      The supplementary tables are very difficult to navigate. They would become more accessible to readers and non-specialists if they were accompanied by brief legends or "README" tabs and if the headers were more detailed (see, for example, Table S2, what does "ctrl.ratio_Larvae_rep2" exactly refer to? Or Table S6 whose column headers using extensive abbreviations are quite obscure). Similarly, what do columns K to BX in Supplementary Table S1 correspond to? Without more substantial explanations, readers have no way of assessing these data points. 

      We have revised the tables and removed any redundant data columns. We also included detailed explanations of the used abbreviations, both in the headers and in a separate README file. Some of the information was apparently lost during the conversion to pdf files. We will therefore upload the original .xls files when submitting the revised manuscript.

      Reviewer #2 (Public review): 

      This work set out to identify all extracellular matrix proteins and associated factors present within the starlet sea anemone Nematostella vectensis at different life stages. Combining existing genomic and transcriptomic datasets, alongside new mass spectometry data, the authors provide a comprehensive description of the Nematostella matrisome. In addition, immunohistochemistry and electron microscopy were used to image whole mount and decellularized mesoglea from all life stages. This served to validate the de-cellularization methods used for proteomic analyses, but also resulted in a very nice description of mesoglea structure at different life stages. A previously published developmental cell type atlas was used to identify the cell type specificity of the matrisome, indicating that the core matrisome is predominantly expressed in the gastrodermis, as well as cnidocytes. The analyses performed were rigorous and the results were clear, supporting the conclusions made by the authors. 

      Thank you. We greatly appreciate the positive assessment of our study.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript by Bergheim et al investigates the molecular and developmental dynamics of the matrisome, a set of gene products that comprise the extracellular matrix, in the sea anemone Nematostella vectensis using transcriptomic and proteomic approaches. Previous work has examined the matrisome of the hydra, a medusozoan, but this is the first study to characterize the matrisome in an anthozoan. The major finding of this work is a description of the components of the matrisome in Nematostella, which turns out to be more complex than that previously observed in hydra. The authors also describe the remodeling of the extracellular matrix that occurs in the transition from larva to primary polyp, and from primary polyp to adult. The authors interpret these data to support previously proposed (Steinmetz et al. 2017) homology between the cnidarian endoderm with the bilaterian mesoderm. 

      Strengths: 

      The data described in this work are robust, combining both transcriptome and proteomic interrogation of key stages in the life history of Nematostella, and are of value to the community. 

      Thank you for your positive assessment of our dataset. 

      Weaknesses: 

      The authors offer numerous evolutionary interpretations of their results that I believe are unfounded. The main problem with extending these results, together with previous results from hydra, into an evolutionary synthesis that aims to reconstruct the matrisome of the ancestral cnidarian is that we are considering data from only two species. I agree with the authors' depiction of hydra as "derived" relative to other medusozoans and see it as potentially misleading to consider the hydra matrisome as an exemplar for the medusozoan matrisome. Given the organismal and morphological diversity of the phylum, a more thorough comparative study that compares matrisome components across a selection of anthozoan and medusozoan species using formal comparative methods to examine hypotheses is required. 

      Specifically, I question the author's interpretation of the evolutionary events depicted in this statement: 

      "The observation that in Hydra both germ layers contribute to the synthesis of core matrisome proteins (Epp et al. 1986; Zhang et al. 2007) might be related to a secondary loss of the anthozoan-specific mesenteries, which represent extensions of the mesoglea into the body cavity sandwiched by two endodermal layers." 

      Anthozoans and medusozoans are evolutionary sisters. Therefore, the secondary loss of "anthozoan-like mesenteries" in hydrozoans is at least as likely as the gain of this character state in anthozoans. By extension, there is no reason to prefer the hypothesis that the state observed in Nematostella, where gastroderm is responsible for the synthesis of the core matrisome components, is the ancestral state of the phylum. Moreover, the fossil evidence provided in support of this hypothesis (Ou et al. 2022) is not relevant here because the material described in that work is of a crown group anthozoan, which diversified well after the origin of Anthozoa. The phylogenetic structure of Cnidaria has been extensively studied using phylogenomic approaches and is generally well supported (Kayal et al. 2018; DeBiasse et al. 2024). Based on these analyses, anthozoans are not on a "basal" branch, as the authors suggest. The structure of cnidarian phylogeny bifurcates with Anthozoa forming one clade and Medusozoa forming the other. From the data reported by Bergheim and coworkers, it is not possible to infer the evolutionary events that gave rise to the different matrisome states observed in Nematostella (an anthozoan) and hydra (a medusozoan). Furthermore, I take the observation in Fig 5 that anthozoan matrisomes generally exhibit a higher complexity than other cnidarian species to be more supportive of a lineage-specific expansion of matrisome components in the Anthozoa, rather than those components being representative of an ancestral state for Cnidaria. Whatever the implication, I take strong issue with the statement that "the acquisition of complex life cycles in medusozoa, that are distinguished by the pelagic medusa stage, led to a secondary reduction in the matrisome repertoire." There is no causal link in any of the data or analyses reported by Bergheim and co-workers to support this statement and, as stated above, while we are dealing with limited data, insufficient to address this question, it seems more likely to me that the matrisome expanded in anthozoans, contrasting with the authors' conclusions. While the discussion raises many interesting evolutionary hypotheses related to the origin of the cnidarian matrisome, which is of vital interest if we are to understand the origin of the bilaterian matrisome, a more thorough comparative analysis, inclusive of a much greater cnidarian species diversity, is required if we are to evaluate these hypotheses. 

      DeBiasse MB, Buckenmeyer A, Macrander J, Babonis LS, Bentlage B, Cartwright P, Prada C, Reitzel AM, Stampar SN, Collins A, et al. 2024. A Cnidarian Phylogenomic Tree Fitted With Hundreds of 18S Leaves. Bulletin of the Society of Systematic Biologists [Internet] 3. Available from: https://ssbbulletin.org/index.php/bssb/article/view/9267

      Epp L, Smid I, Tardent P. 1986. Synthesis of the mesoglea by ectoderm and endoderm in reassembled hydra. J Morphol [Internet] 189:271-279. Available from: https://pubmed.ncbi.nlm.nih.gov/29954165/ 

      Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. 2018. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol [Internet] 18:1-18. Available from: https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-018-1142-0

      Ou Q, Shu D, Zhang Z, Han J, Van Iten H, Cheng M, Sun J, Yao X, Wang R, Mayer G. 2022. Dawn of complex animal food webs: A new predatory anthozoan (Cnidaria) from Cambrian. The Innovation 3:100195 

      Steinmetz PRH, Aman A, Kraus JEM, Technau U. 2017. Gut-like ectodermal tissue in a sea anemone challenges germ layer homology. Nature Ecology & Evolution 2017 1:10 [Internet] 1:1535-1542. Available from: https://www.nature.com/articles/s41559-017-0285-5

      Zhang X, Boot-Handford RP, Huxley-Jones J, Forse LN, Mould AP, Robertson DL, Li L, Athiyal M, Sarras MP. 2007. The collagens of hydra provide insight into the evolution of metazoan extracellular matrices. J Biol Chem [Internet] 282:6792-6802. Available from: https://pubmed.ncbi.nlm.nih.gov/17204477/ 

      We agree with the reviewer that only the analysis of several additional anthozoan and medusozoan representatives will yield a valid basis for a reconstruction of the ancestral cnidarian matrisome and allow statements about ancestral or novel features within the phylum. We have therefore revised our statements in the discussion part of the manuscript by implementing the cited literature and also findings from medusozoan genome analysis (e.g. Gold et al., 2018) demonstrating that changes in gene content are as common in the anthozoans as in medusozoans, which questioned the previously stated “basal” state of Nematostella or of anthozoans in general.

      Reviewer #1 (Recommendations for the authors): 

      (1) In Figure 2A, an "o" is missing in the labeling of the "developing cnidcytes" population. 

      Thank you, we have corrected the typo.

      (2) It would be helpful to have the different life stages indicated as headers of the heat maps presented in Figure 4. 

      We have included symbolic representations for the different life stages on top of the heat maps in addition to the respective labels at the bottom.

      Reviewer #2 (Recommendations for the authors): 

      Important changes: 

      (1) Figure 2B The x-axis tissue names should be changed to something more easily readable/understandable - some are clear, but others are not. Perhaps abbreviations could be expanded in the legend. 

      We have expanded the legend in Fig. 2B to render it more easily readable. We have also rotated the maps in A to have them aligned with the ones in Fig.3B.

      (2) Figure 3B This figure would be improved by the inclusion of cluster names, to understand better the mapping. 

      We have added relevant cluster names to Fig. 3B and as stated above aligned the orientation of the maps in Fig. 2B and Fig. 3B.

      (3) Figure 3C As with 2B, I find the y-axis cnidocyte cell state names to be unclear at times. Perhaps abbreviations could be expanded in the legend. 

      All abbreviations were expanded in Fig.3C axis labels.

      (4) Many of the supplementary tables are not well exported or easily readable as is (gene names are truncated, headers truncated, etc), which means that they may not be easily usable by researchers in the field interested in following up on this work in other contexts. Indeed, to be more usable, please consider sharing these supplementary data as .csv files, for example, instead of as .pdfs. 

      We are sorry for this inconvenience, which was obviously caused by the conversion to pdf files. We will upload the original csv files when submitting the revised manuscript.

      Smaller nitpicky comments: 

      (5) Page 2 line 4 & page 3 line 7: Please consider a term other than "pre-bilaterian". The drawing/ordering of a phylogeny of extant species is not meaningful in terms of more or less ancestral. e.g. if the tips are flipped in the drawing of the tree, can we say that bilaterians are pre-cnidarians? What does that mean? 

      We have used that term on the basis that cnidarians existed before the appearance of bilaterians according to the fossil record and molecular phylogenies (McFadden et al., 2021; Adoutte et al., 2000;Cavalier-Smith et al., 1996; Collins, 1998; Kim et al., 1999; Medina et al., 2001; Wainright et al., 1993). To acknowledge remaining uncertainties in the timing of origin of animals, we will use the term “early-diverging metazoans” instead, which is widely accepted in the cnidarian community. 

      (6) Page 3 line 9 I was confused by the use of "gastrula-shaped body" to describe cnidarians, which are on the whole very morphologically diverse and don't all resemble gastrulae (that can also be quite diverse). 

      This term is sometimes used to refer to the diploblastic cnidarian body plan (outer ectoderm, inner endoderm) with a mouth that corresponds to the blastopore. To avoid misunderstandings, we changed it in the revised manuscript to “Cnidarians, the sister group to bilaterians, are characterized by a simple body plan with a central body cavity and a mouth opening surrounded by tentacles.”

      Reviewer #3 (Recommendations for the authors): 

      (1) In general, I felt there was a lot of discussion about protein structure and diversity that is difficult to follow without a figure. I think some of the information in Supplementary Figures S5, S9, and S11 should be in the main figures. 

      Following the reviewer’s suggestion, we have integrated Fig. S5 (collagens) into the main Fig. 2 and Fig. S9 (polydoms) into Fig. 4. As metalloproteases are not extensively discussed in the manuscript (and also due to the large size of the figure) we have kept Fig. S11 as a supplementary figure.

      (2) Page 3, Line 7: The use of the term "pre-bilaterian" is inappropriate. Cnidarians and bilaterians are evolutionary sisters. Therefore, each lineage derives from the same split and is the same age. The cnidarian lineage is not older than the bilaterian lineage. 

      Following a similar request by reviewer 2 we have replaced this term by “early diverging metazoans”.

      (3) Page 5, Line 10. How were in silico matrisomes from early-branching metazoan species predicted? 

      We applied the same bioinformatic pipeline as for the Nematostella matrisome. We clarified this in the respective methods part.

      (4) Page 16, Line 8: This should be Thus. 

      Obviously, the wording of this sentence was ambiguous. We changed it to ”In contrast, the adult mesoglea is significantly enriched in elastic fiber components, such as fibrillins and fibulin. This compositional shift likely adds to the visco-elastic properties (Gosline 1971a, b) of the growing body column (Fig. 4B,D, supplementary table S7).”

    1. Author response:

      We thank the editors and reviewers for their encouraging comments and constructive feedback. We will revise the text to enhance clarity as suggested. New experiments are planned to address questions raised regarding the time course of responses to the hit compounds. We also intend to examine additional endogenous readouts of the integrated stress response, including effects on translation. The effects of lead compound 20 will be examined in a wider range of cells, including primary cells.

    1. Author response:

      We are going to modify the text following Reviewer’s comments and perform embryo direct labelling experiments to experimentally address the contraction of the two “belts” proposed in our model. We feel that this aspect is feasible in a reasonable time and important for the model proposed. We appreciate the relevance of using this framework to identify molecular drivers of the regionalized tissue behaviours uncovered and how these might be altered in mutant models, but feel that these aspects demand efforts beyond the the reasonable revision periods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript by Senn and colleagues presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS-expressing cells. This study aims to exploit these RAS-targeting circuits as cancer cell classifiers, enabling the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain, the RBDCRD domain of the RAS effector protein CRAF, is fused to the histidine kinase domain, which carries an inactivating amino acid exchange either in its ATP-binding site (N509A) or in its phosphorylation site (H399Q). Dimerization or nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL. The phosphorylated DNA-binding protein NarL, fused to the transcription activator domain VP48, binds its responsive element and induces the expression of the output protein. In comparison to mutated RAS, the effect of the RAS activator SOS-1 and the RAS inhibitor NF1 on the sensing ability as well as the tunability of the RAS sensor were examined. A RAS targeting circuit with an AND gate was designed by expressing the RAS sensor proteins under the control of defined MAPK response elements, resulting in a large increase in the dynamic range between mutant and wild-type RAS. Finally, the RAS targeting circuits were evaluated in detail in a set of twelve cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. 

      Strengths: 

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines and to function, at least in part, as an RAS mutant cell classifier. 

      Weaknesses: 

      The use of an appropriate "therapeutic gene" might revert the oncogenic properties of RAS mutant cell lines. However, a therapeutic strategy based on this four-plasmid-based system might be difficult to implement in RAS-driven solid cancers. 

      Thank you for the insightful comments. We agree that the delivery of a four-plasmid system represents a major challenge for translating RAS-targeting circuits into therapeutic applications. Reducing the number of plasmids –ideally consolidating all components onto a single vector– will be critical for clinical implementation.

      Viral delivery is generally the most efficient strategy for DNA-based therapies, but viral vectors have limited packaging capacities, which differ by virus type[1]. The RAS_sensor_F.L.T. circuit under the EF1α promoter requires ~7.7 kb for the sensing components alone, excluding the output gene. This exceeds the packaging limit of adeno-associated virus (AAV) and is at the upper boundary for lentiviral vectors but could potentially be accommodated by larger vectors such as γ-retroviruses, poxviruses, or herpesviruses¹. Co-transduction with dual AAVs [2] or ongoing engineering to expand packaging capacity [3] may also offer future solutions. An additional route to reduce construct size could be alternative splicing, especially given redundancy between the two NarX fusion proteins[4]. 

      An advantage of our current architecture is that synthetic response elements replace constitutive promoters, reducing construct size. For example, the MAPK-driven PY2_NarX&NarL circuits range between 4.9 and 5.2 kb depending on the transactivation domain, bringing them within AAV packaging limits for the sensor module[5], though co-delivery of the output gene would still be necessary. For lentiviruses, this is within the packaging capacity of 8 kb<sup>1</sup> and would allow for inclusion of ~3 kb output genes.

      Still, assembling multiple modules onto a single vector introduces new challenges, including possible crosstalk or interference between neighboring promoters [6]. For example, placing the output gene too close to MAPK response elements may trigger unwanted MAPKdependent expression, potentially bypassing the intended AND-gate logic. Moreover, expressing three genes under separate response elements may shift expression ratios and reduce circuit functionality. Nonetheless, the absence of constitutive promoters and the RAS-dependence of MAPK response elements could provide partial robustness, since even unintended activation would still reflect RAS signaling to some extent. Further, our data (Fig. 1d) show that some deviation in component levels can be tolerated, provided all parts are sufficiently expressed. Nonetheless, assembling the circuit on a single vector will require careful design and rigorous validation to ensure optimal performance. 

      While addressing this is beyond the scope of the current study, we agree that future efforts should focus on vector consolidation and delivery strategies. We now include a paragraph discussing these challenges in the revised manuscript.

      Reviewer #2 (Public review): 

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision. 

      Thank you very much for the thoughtful evaluation, precise critique, and constructive suggestions.

      As correctly noted, our study initially focused on developing and optimizing input sensors and processing units for synthetic gene circuits targeting mutated RAS. To address the concern regarding therapeutic relevance, we have now incorporated functional validation using a clinically relevant output protein: herpes simplex virus thymidine kinase (HSV-TK), which converts ganciclovir into a cytotoxic compound. We replaced the mCerulean reporter with HSV-TK and tested the resulting RAS-targeting circuits in both RAS-mutant and wild-type cancer cell lines. The results, now presented in a new chapter (Figure 8 and Supplementary Fig. 14), demonstrate robust killing of RAS-mutant cells and support the potential therapeutic utility of these circuits.

      Major comments: 

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs. 

      Thank you for this helpful suggestion. We have updated the introduction to reflect the rapidly evolving landscape of RAS-targeting therapies, including the development of inhibitors for nonG12C mutations such as KRASG12D (e.g., MRTX1133). Given the pace and breadth of these advances, we also refer readers to a recent comprehensive review that provides an in-depth overview of current RAS-targeting strategies.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented. 

      To further support the generalizability of our RAS sensor, we titrated plasmid doses for a panel of oncogenic RAS variants, including multiple KRAS mutants as well as HRAS<sup>G12D</sup and NRAS<sup>G12D</sup. Across all tested variants, we observed concentration-dependent activation of the RAS sensor. At 1.67 ng/well, the sensor output for all oncogenic RAS variants was at least as high as that for KRAS<sup>G12D</sup>, suggesting that the behavior observed in our initial design and optimization is representative of a broader set of RAS mutations.

      We also noted that high overexpression of wildtype HRAS and NRAS can lead to substantial activation of the sensor, exceeding that observed with wildtype KRAS. This underscores the importance of considering all RAS isoforms when assessing circuit specificity and avoiding potential off-target activation in healthy cells.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]). 

      Thank you for pointing this out. We repeated the experiment to reassess the effect of NF1 on RBDCRD-NarX-SYFP2 expression and were able to confirm statistical significance. Accordingly, we have replaced Figure 2a with updated data. To facilitate better visual comparison across conditions, we also standardized the y-axis range across all relevant flow cytometry plots.

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc. 

      Thank you for this insightful comment. We agree that fluorescent reporters are limited to approximating expression levels, and that a functional output protein is more appropriate for assessing therapeutic potential. To address this, we replaced mCerulean with the therapeutic suicide-gene, HSV-TK, and tested the circuits in RAS-mutant and wild-type cancer cell lines. These experiments demonstrate that our circuits can express functional proteins and induce cell death in two RAS-mutant cell lines while showing low toxicity in a RAS wild type cell line (new chapter including Fig. 8 and Supplementary Fig.14). 

      Comparing confluence of cells transfected with the RAS-targeting circuits to cells transfected with non-toxic GFP-output negative control or the constitutively expressed EF1αHSV-TK positive control allowed us to estimate the killing-strength of the circuits in each cell line. In RAS-mutant HCT-116 the confluence curves were similar to the positive control, indicating effective killing (Fig. 8b). At lower DNA dose in HCT-116, or in SW620 with lower transfection efficiency, the killing of transfected RAS-driven cancer cells was less pronounced, falling approximately midway between the controls (Fig. 8g&j). In the RAS wild type cell line, Igrov-1, the RAS circuits showed continued growth similar to the non-toxic negative control (Fig. 8d), suggesting low toxicity. 

      While this may indicate low circuit activation in Igrov-1, an alternative explanation for the low toxicity could also be insufficient transfection efficiency. Testing in SW620 –which had similar transfection efficiency as Igrov-1 (Supplementary Fig. 14a)– showed that this moderate transfection efficiency was sufficient for RAS-circuit-dependent killing (Fig. 8d & 8g), supporting the notion of low activation in Igrov-1 and selective cytotoxicity in RAS-driven cancer cells.

      Nonetheless, it is important to note that comparisons between the cell lines need to be interpreted cautiously because of inter-cell line differences in transfection, growth, and HSV-TK/ganciclovir (GCV)-sensitivity (Supplementary Fig. 14) and further validation will be essential. 

      A conclusive assessment will require more efficient delivery strategies, such as viral vectors (as discussed above). Efficient delivery would allow to investigate selectivity in a more realistic setting with patient-derived RAS-mutant cancer and healthy cells as well as testing in an vivo model. While beyond the scope of the current study, we view it as a critical direction for future work and have therefore added a paragraph about this to our discussion.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12Dresponsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? 

      This is a great point. We agree that the observed differences in output levels (Fig. 2) could arise from non-linear amplification due to increased expression of RBDCRD-NarX, rather than RAS binding or dimerization alone. To further investigate this possibility, we performed titrations of KRAS<sup>G12D</sup> in combination with the functional RAS sensor and a series of constitutively active and inactive control constructs (Supplementary Fig. 4).

      Inactive controls lacking NarX dimerization showed only a modest increase in output expression, similar to direct mCerulean expression under the EF1α promoter. Transfection of the output plasmid alone, with NarL, or with NarL and non-RAS-binding RBD<sup>R89L</sup> CRD<sup>C168S</sup> -NarX, resulted in minimal RAS-dependent increases (Supplementary Fig. 4a). Importantly, after normalization using the EF1α-driven mCherry transfection control, these effects were fully or even slightly over-compensated (Supplementary Fig. 4b), showing that we don’t include the effect of EF1α-dependent increased leakiness in the data presented throughout the manuscript, but also that –due to the normalization– we potentially underestimate the dynamic range of the RAS-targeting circuits.

      In contrast, constitutively dimerizing NarX controls (both membrane-bound and cytosolic dimerized via the FKBP–FRB system) exhibited a more pronounced RAS-dependent increase in output –even after normalization– confirming the presence of non-linear amplification (up to 3–4fold). However, this effect was still lower than that achieved with the functional RAS-binding sensor (8-fold at 1.67 ng/well KRAS<sup>G12D</sup>; 14-fold at 5–15 ng/well), indicating that the increase in expression of the sensor parts is not the full explanation of the effect we see. Instead, RAS binding and dimerization further amplify the response and are necessary for full activation (Supplementary Fig. 4b).

      We also addressed the reviewer’s suggestion by testing the MAPK response elements used in Fig. 4f with constitutively dimerizing NarX. These controls generally showed lower fold changes between KRAS<sup>G12D</sup>; and KRAS<sup>WT</sup> than the corresponding RAS-binding circuits  (Supplementary Fig. 7), with one exception: the combination of SRE_NarX and PY2_NarL-VP48. 

      Together, these data show that non-linear amplification via increased expression and dimerization contributes to output activation. However, RAS binding and induced dimerization of the NarX sensor are required for full functionality and enhanced signal strength. This underscores that integrating the MAPK response elements with the binding-based RAS sensor into RAS-targeting circuits generally improves the distinction between cells with KRAS<sup>G12D</sup>;  and KRAS<sup>WT</sup> and that it was the combination that allowed to reach maximal fold changes.

      It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered. 

      Thank you for this comment. We now mention in the manuscript the potential mechanisms by which (over-)activated RAS or MAPK signaling can increase protein synthesis. We cite relevant reports of the mechanisms we found, including upregulation of translational initiation and machinery[10]  and ribosomal biogenesis[11].

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, bindingtriggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated. 

      Thank you for this input. We understand that comparing the results from HEK293 cells transfected with KRAS<sup>G12D</sup>;  or KRAS<sup>WT</sup> (Fig. 5d) to those from HCT-116<sup>WT</sup>    and HCT-116<sup>k.o</sup>. cells (Fig. 6b–d) may be misleading if interpreted as a direct comparison of RAS signaling levels. Our intent was not to compare HEK293 with KRAS<sup>WT</sup> directly to HCT-116<sup>k.o</sup>.., but rather to contrast the behavior of the EF1α-driven RAS sensor and the MAPK-responsive RAS-targeting circuits within each cell line context.

      Specifically, we observed that in HEK293 cells expressing KRAS<sup>G12D</sup>, the MAPK-based RAS-targeting circuits produced higher output than the EF1α-expressed RAS sensor. In contrast, in HCT-116<sup>WT</sup> cells, the EF1α-expressed RAS sensor resulted in higher output levels than the RAS-targeting circuits. Despite this, the MAPK-driven circuits showed an improved dynamic range compared to the EF1α-expressed RAS sensor in HCT-116, due to the reduced background expression in the HCT-116<sup>k.o</sup>.. cells. We have revised the manuscript text to clarify this distinction.

      We agree that an HCT-116<sup>k.o</sup> cell line with stable integration of KRAS<sup>WT</sup> would provide a more direct comparison. Nonetheless, HCT-116<sup>k.o</sup>.. cells still express endogenous NRAS and HRAS, both of which are capable of activating the RAS sensor (as shown in Fig. 1g). Therefore, we believe that HCT-116<sup>k.o</sup>. cells are more comparable to HEK293 with KRAS<sup>WT</sup> than to the NF1 condition in Fig. 2, in which all endogenous RAS isoforms are inactivated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text. 

      Thank you for this helpful observation. The figure references were indeed incorrect due to a typo. The results discussed in the text refer to Figure 6f (not 6g), which is now Figure 7a in the revised version. To further highlight these findings, we have added a new Figure 7b that better illustrates how different MAPK response elements enabled us to identify, for each RAS-mutant cell line, a RAS-targeting circuit that showed stronger activation than in all RAS wild-type lines. We have also expanded the corresponding section in the main text to elaborate on these results and their significance.

      Reviewer #3 (Public review): 

      Summary: 

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation. 

      This approach is interesting. The design is novel and could be implemented for other RASmediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting aberrant RAS activity. 

      Strengths: 

      Novel circuit design, through optimization and characterization of the circuit components, solid data. 

      Weaknesses: 

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims. 

      Thank you very much for the thoughtful and supportive comments. We fully agree with the reviewer’s suggestions for improving the translational potential of the RAS-targeting circuits.

      As a first step toward therapeutic relevance, we replaced the fluorescent reporter with HSV-TK, a clinically validated suicide gene, and demonstrated killing in RAS-mutant cancer cell lines. This is described above and in the new section of the manuscript (Figure 8).

      We also agree that testing in patient-derived cancer cells and especially healthy cells with wild-type RAS activity will be essential. However, testing in primary or patient-derived cells presents delivery challenges: transient transfection of our current four-plasmid system is unlikely to achieve sufficient expression. As discussed in our response to Reviewer #1, development of a more efficient delivery strategy –such as viral vector-based delivery– is a necessary next step.

      Once a delivery system is established, identifying relevant off-target tissues throughout the body with high physiological RAS signaling will be key to assessing selectivity. While comparative data on RAS activation across healthy tissues are scarce[12,13], recent atlases of transcription factor activity[14,15] provide insights to identify off-target cells with high activation of RAS-dependent transcription factors and may even approximate RAS activity across healthy tissue. Alternatively, our single-input sensors for RAS and MAPK pathway activity could be used in vivo to identify off-target cells based on endogenous activity.

      Once relevant target and off-target cells have been identified, patient-derived cancer and healthy cells can help select and adapt cancer-specific RAS-targeting circuits and nominate therapeutic candidates for further safety and efficacy assessment[6,8].

      Reviewer #1 (Recommendations for the authors): 

      For the most part, the data in this study are very convincing and very well presented. The cartoons make it easier to understand the complex experimental setups. 

      (1) Did the authors use wild-type Sos-1 or a constitutively active membrane-bound catalytic domain in their studies? How is SOS-1 activated when in case Sos-1 wild-type was used? 

      Thank you for this feedback. We used the constitutively active catalytic domain of Sos-1 (AA5641049; PDB ID 2II0). 

      (2) Figure 1f: In case of KRAS-G12D, it looks like the output expression does not really correlate with the RAS-GTP level. Can the authors give an explanation? 

      Thank you for this interesting question. We believe the observed discrepancy arises primarily from differences in the sensitivity and readout dynamics of the two assays. The RAS-GTP pulldown ELISA appears insufficiently sensitive to detect small changes in RAS-GTP levels at lower KRAS<sup>G12D</sup> plasmid doses (0.19, 0.56, or 1.67 ng). Only at 5 ng and 15 ng do we observe clear increases in RAS-GTP signal (25% and 700%, respectively). In contrast, the RAS sensor shows strong activation already in the 0.56–5 ng range but begins to saturate at higher doses (see Figure 1f and Figure 1e).

      Beyond the differing technical sensitivities of the ELISA (plate reader) and flow cytometry, an important conceptual distinction may further explain this behavior: the RAS sensor likely integrates RAS signaling over time. Once NarX binds RAS-GTP and dimerizes, it activates NarL, triggering mCerulean expression. If the rate of mCerulean production exceeds its degradation, signal accumulates throughout the assay duration. Thus, the flow cytometry readout reflects time-integrated signaling, allowing small differences in RAS-GTP to be amplified into measurable differences in output—especially at low input levels. This may explain why flow cytometry detects circuit activation earlier and more steeply than the pulldown assay, which provides a snapshot of RAS-GTP abundance at a single time point and saturates less readily at high input levels.

      Together, these factors likely explain the observed differences in signal dynamics: the RAS sensor exhibits steep activation followed by saturation at high plasmid doses (flow cytometry), while the ELISA shows limited sensitivity at low doses but a broader linear range at higher doses.

      (3) Figure 2b: It appears that even in the case of KRAS-G12D and Sos-1, only a few cells are positive. Does this result depend on low cell density, low transfection efficiency, or a wide range of the expression level? As a control, nuclear staining could be shown. 

      Thank you for this question. In the experiment shown in Figure 2b, our goal was to assess the membrane localization of the RBD^CRD-NarX-SYFP2 construct, which serves as a proxy for RAS-bound sensor. To enable accurate computational segmentation and separation of membrane signal from adjacent cells, we intentionally reseeded cells at low density in glassbottom plates for confocal imaging.

      The observed variability in signal likely reflects a combination of transient transfection and heterogeneous expression levels. While the overall transfection efficiency was approximately 70%, expression varied between individual cells. To account for this, we analyzed the membrane-to-total signal ratio per cell, which internally normalizes the membrane signal to the total cellular expression of SYFP2 and controls for differences in transfection efficiency.

      In response to the reviewer’s suggestion, we have updated the figure to include nuclear staining to aid interpretation. We would like to emphasize, however, that the images are intended to illustrate subcellular localization per cell, not expression frequency or intensity across the population.

      Minor points 

      (1) Figure 1b: "The third plasmid expresses NarL, .." should be changed to "The third plasmid expresses NarL-VP48, .." 

      Done

      (2) Figure 1c, right part: The orange arrow should be labeled NarX-H399Q (not N509A). 

      Done

      (3) Supplementary Table 6 and 7: [cells/wells] - should probably be [cells 10*3/well]. 

      Thank you for these points, we updated the manuscript accordingly

      Reviewer #2 (Recommendations for the authors): 

      Minor comments: 

      (1) N509A seems mislabeled in Figure 1b. 

      (2) It would help the readers if the authors could elaborate a bit on what is known about the RBD and CRD mutations used here. 

      Thank you for the input, we added a paragraph in the paper to expand on the effect of these commonly used mutations.

      (3) The KRASWT&Sos1 condition is not explained within the text for Figure 1f, which is the first figure with the KRASWT&Sos1 condition, but rather later on for Figure 2a. Adding a description of this condition to the discussion of Figure 1f would add clarity to this figure. 

      Thank you, we corrected this.

      (4) Citing AlphaFold2 structural predictions as having "revealed that longer linkers between the sensor's RBDCRD and NarX-derived domains could bring the NarX domains into closer proximity" is probably an overstatement. AlphaFold2 generally has low confidence in the placement of long flexible linkers, and the longer linkers in the illustration could facilitate NarX and NarL being even farther apart than they are in the original design. 

      Thank you for this input. We agree that AlphaFold2 predictions generally have low confidence in the placement of long, flexible linkers, and we did not intend to imply that the structural models were predictive of actual linker conformations. Rather, the models were used heuristically to generate the hypothesis that longer linkers might facilitate better positioning of the NarX domains for dimerization.

      As described in the Methods, we manually rotated the flexible linker regions to explore plausible conformations. These exploratory models showed that with a short (1x GGGGS) linker, it was more challenging to bring the NarX domains into close proximity, whereas longer linkers allowed greater positional flexibility. This modeling exercise provided a structural rationale for experimentally testing longer linkers. We have revised the manuscript text to clarify that the structural predictions were used to motivate linker design –not to validate or predict structural outcomes.

      (5) Figure 3b shows that the fold change (KRASG12D/KRASWT) is higher at shorter linker lengths and lower at longer linker lengths, and that the output expression of mCerulean is lower at shorter linker lengths and higher at longer linker lengths. Having a bar plot with the output expression mCerulean levels comparing KRASG12D and KRASWT next to each other would be a significantly more informative representation of this data. In particular, the readers might be interested in understanding the effect of linker length on off-target activation from the sensor, which is not clear from this figure. 

      Thank you for the suggestion. We adapted Figure 3b to better present this. 

      (6) While it is implied that the sentence "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range." is comparing these RAS binding domains to RBDCRD, for clarity it should be noted what the point of reference is for this "slightly higher or similar dynamic range." 

      (7) Claims are made throughout the text that require supporting data, and thus require a reference to a figure, but there are a few instances where the reference is several sentences after the discussion of data and findings begins. For example, the discussion of Figure 3c begins with the claim "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range," but there is no reference to the data or figure being discussed until the end of the discussion of Figure 3c. This formatting is also present in Figure 3d and Figure 6f. 

      Thank you for mentioning these imprecisions and inconsistencies, we addressed them in the manuscript. 

      (8) In Figures 5d and 5e, the formatting of underscores and dashes is occasionally inconsistent within the text. (ex. "PY2_NarX_FLT or PY2_NarL-FLT" on page 13.). 

      Thank you for this precise observation. The formatting differences were intentional and reflect distinct design principles. Specifically:

      An underscore (e.g., PY2_NarX_FLT) denotes that two separate proteins are expressed –here, PY2-driven RBDCRD-NarX and EF1α-driven NarL-F.L.T.

      A dash (e.g., PY2_NarL-F.L.T.) indicates a fusion protein –i.e., PY2-driven NarL-F.L.T. combined with EF1α-driven RBDCRD-NarX.

      This notation is used to distinguish expression sources and fusion constructs while avoiding redundancy with the base circuit (EF1α_NarX + EF1α_NarL-VP48). We hope the included schematic diagrams in each relevant figure helps the reader interpret these combinations.

      (9) The text claims that "loss-of-function mutations in RBDCRD decreased activation. However, the dynamic range was only 3-fold" and attributes this claim to Figure 6a. For a claim about specific fold-change activation, one would expect a corresponding figure with quantitative measurements of this fluorescence to be referenced. 

      Thank you for this remark. We made a supplementary figure (Supplementary Fig. 11) to show the quantitative measurement of the 3-fold dynamic range between HCT-116<sup>WT</sup> and HCT-116<sup>k.o</sup>. when using the EF1a-expressed RAS sensor with NarL-VP48.

      (10) The claim of this Figure 2d is that the effect of RAS-GTP levels on mCerulean output is amplified in comparison to Figures 2a, 2b, and 3c, representing expression, RAS binding, and dimerization respectively. While visually this might be true from the figure, the readers might be confused by the lack of significance between the control and the NF1 condition, alongside the variation between the triplicates. Could this experiment be repeated to gain clearer data and to support their claim more effectively? 

      Thank you for this important observation. To address the concern regarding variability and statistical significance in Figure 2d, we repeated the experiment using 24-well plates to increase the number of cells analyzed per condition. This improved the consistency of the data and allowed us to reduce variability across replicates. As a result, we now observe a statistically significant difference between the control and the NF1 condition. The updated results are shown in the revised Figure 2.

      (11) The readers might be less familiar with the concept of "composability" than "modularity" and it would be good to explain it if the authors did intend to use the former. 

      Thank you for this comment. We changed it to modularity to avoid confusion. 

      References

      (1) Shahryari, A., Burtscher, I., Nazari, Z. & Lickert, H. Engineering Gene Therapy: Advances and Barriers. Advanced Therapeutics vol. 4 Preprint at https://doi.org/10.1002/adtp.202100040 (2021).

      (2) Mcclements, M. E. & Maclaren, R. E. Adeno-Associated Virus (AAV) Dual Vector Strategies for Gene Therapy Encoding Large Transgenes. YALE JOURNAL OF BIOLOGY AND MEDICINE vol. 90 (2017).

      (3) Wagner, H. J., Weber, W. & Fussenegger, M. Synthetic Biology: Emerging Concepts to Design and Advance Adeno-Associated Viral Vectors for Gene Therapy. Advanced Science vol. 8 Preprint at https://doi.org/10.1002/advs.202004018 (2021).

      (4) Doshi, J., Willis, K., Madurga, A., Stelzer, C. & Benenson, Y. Multiple Alternative Promoters and Alternative Splicing Enable Universal Transcription-Based Logic Computation in Mammalian Cells. Cell Rep 33, 108437 (2020).

      (5) Wu, Z., Yang, H. & Colosi, P. Effect of genome size on AAV vector packaging. Molecular Therapy 18, 80–86 (2010).

      (6) Dastor, M. et al. A Workflow for in Vivo Evaluation of Candidate Inputs and Outputs for Cell Classifier Gene Circuits. ACS Synth Biol 7, 474–489 (2018).

      (7) Preuß, E. et al. TK.007: A novel, codon-optimized HSVtk(A168H) mutant for suicide gene therapy. Hum Gene Ther 21, 929–941 (2010).

      (8) Angelici, B., Shen, L., Schreiber, J., Abraham, A. & Benenson, Y. An AAV gene therapy computes over multiple cellular inputs to enable precise targeting of multifocal hepatocellular carcinoma in mice. Sci Transl Med 13, (2021).

      (9) Mesnil, M. & Yamasaki, H. Bystander Effect in Herpes Simplex Virus-Thymidine Kinase/Ganciclovir Cancer Gene Therapy: Role of Gap-Junctional Intercellular Communication 1. CANCER RESEARCH vol. 60 http://aacrjournals.org/cancerres/articlepdf/60/15/3989/2478218/ch150003989.pdf (2000).

      (10) Proud, C. G. Ras, PI3-kinase and mTOR signaling in cardiac hypertrophy. Cardiovascular Research vol. 63 403–413 Preprint at https://doi.org/10.1016/j.cardiores.2004.02.003 (2004).

      (11) Azman, M. S. et al. An ERK1/2driven RNAbinding switch in nucleolin drives ribosome biogenesis and pancreatic tumorigenesis downstream of RAS oncogene. EMBO J 42, (2023).

      (12) von Lintig, F. C. et al. Ras activation in normal white blood cells and childhood acute lymphoblastic leukemia. Clin Cancer Res 6, 1804–10 (2000).

      (13) Guha, A., Feldkamp, M. M., Lau, N., Boss, G. & Pawson, A. Proliferation of human malignant astrocytomas is dependent on Ras activation. Oncogene 15, 2755–2765 (1997).

      (14) Pan, L. et al. HTCA: a database with an in-depth characterization of the single-cell human transcriptome. Nucleic Acids Res 51, D1019–D1028 (2023).

      (15) Pan, L. et al. Single Cell Atlas: a single-cell multi-omics human cell encyclopedia. Genome Biol 25, (2024).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1:

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition.

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition in MP-46  CDX model treated with our BAF ATPase inhibitor can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Reviewer 3:<br /> Supplementary Figure 2C<br /> Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

      We thank the reviewer for bringing this to our attention. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well-written, and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth-inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with the loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study including the strong challenge of the on-target effect, the assays used, and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors introduce the field stating that SMARCA4 inhibitors are more effective in SMARCA2 deficient cancers and the converse. Since the desirable outcome of cancer therapy would be synthetic lethality it is not clear why a dual inhibitor is desirable. Wouldn't this be associated with more side effects? It is not known how the inhibitor developed here impacts normal cells, in particular T cells which are essential for any durable response to cancer therapies in patients. Another weakness is that the UM cell lines used do not molecularly resemble metastatic UM. These UM most frequently have mutations in the BAP1 tumor suppressor gene. It is not clear if the described SMARCA2/4 inhibitor is efficacious in BAP1 mutant UM cell lines in vitro or BAP1 mutant patient-derived xenografts in vivo.

      We thank the reviewer for their insightful and constructive comments. As we demonstrate in Fig. 1d, uveal melanoma cells are selectively and deeply sensitive to BAF ATPase inhibition, and provides a therapeutic window. This is confirmed in Fig. 4a-c, as we demonstrated robust tumor growth inhibition, achieved at a dose well-tolerated in xenograft study. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017) and manuscript describing results of this clinical trial is currently in preparation.

      As the reviewer mentioned, BAP1 loss is a signature of metastatic uveal melanoma. MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity that work through a different mode as previously developed SMARCA4/SMARCA2 inhibitors. They also demonstrate the anti-tumor effects of the compounds on uveal melanoma cell proliferation and tumor growth. The findings indicate that the drugs exert their effects by altering chromatin accessibility at binding sites for lineage-specific transcription factors within gene enhancer regions. In uveal melanoma, altered expression of the transcription factor, SOX10, and SOX10 target gene underlies the anti-proliferative effects of the compounds. This study is significant because the discovery of new SMARCA4/SMARCA2 inhibitory compounds that can abrogate uveal melanoma tumorigenicity has therapeutic value. In addition, the findings provide evidence for the therapeutic use of these compounds in other transcription factor-dependent cancers.

      Strengths:

      The strengths of this manuscript include biochemical evidence that the new compounds are selective for SMARCA4/SMARCA2 over other ATPases and that the mode of action is distinct from a previously developed compound, BRM014, which binds the RecA lobe of SMARCA2. There is also strong evidence that FHT1015 suppresses uveal melanoma proliferation by inducing apoptosis. The in vivo suppression of tumor growth without toxicity validates the potential therapeutic utility of one of the new drugs. The conclusion that FHT1015 primarily inhibits SMARCA4 activity and thereby suppresses chromatin accessibility at lineage-specific enhancers is substantiated by ATAC-seq and ChIP-seq studies.

      Weaknesses:

      The weaknesses include a lack of more precise information on which SMARCA4/SMARCA2 residues the drugs bind. Although the I1173M/I1143M mutations are evidence that the critical residues for binding reside outside the RecA lobe, this site is conserved in CHD4, which is not affected by the compounds. Hence, this site may be necessary but not sufficient for drug binding or specifying selectivity. A more precise evaluation of the region specifying the effect of the new compounds would strengthen the evidence that they work through a novel mode and that they are selective. Another concern is that the mechanisms by which FHT1015 promotes apoptosis rather than simply cell cycle arrest are not clear. Does SOX10 or another lineage-specific transcription factor underlie the apoptotic effects of the compounds?

      We thank the reviewer for the valuable comments.

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      The reviewer also poses a great question regarding the mechanism of apoptosis. The mechanism of apoptosis is extremely complex, but we observed a decrease in pro-survival BCL-2 protein expression in response to FHT-1015, in the experiment corresponding to Supplementary Fig. 5e. In the experiment described in Fig. 3k, we also monitored caspase 3/7 activity over time, and SOX10 overexpression rescued 92-1 cells from FHT-1015 induced apoptosis. This suggests the role of SOX10 as an important mediator of response to BAF ATPase inhibition, including apoptosis induced by FHT-1015.

      Additional Reviews:

      The referees would like to draw the authors' attention to the following issues that would best benefit from additional revision. 

      The clinical relevance of the study would be strengthened by the use of uveal melanoma cell lines with BAP1 mutations that better represent metastatic uveal melanoma. The use of patient-derived xenografts would also be pertinent and would be a useful addition. Similarly, attention to the effects of the inhibitor on non-cancerous proliferative cells such as blood/T/immune cells would also strengthen the manuscript. As the study reports the administration of one of the inhibitors in mice for the xenograft experiments, it would be important to assess any potential effects on blood cell counts and better discuss the eventual toxicity or lack of toxicity and how it was assessed. 

      The authors should better explain how SOX10 over expression can rescue viability in the presence of the inhibitor. Similarly given the critical roles of BRG1, SOX10, and MITF in cutaneous melanoma some specific discussion on the sensitivity of cutaneous melanoma cells to the inhibitor should be considered, and potential differences with uveal melanoma highlighted. 

      Aside from these issues, the authors are urged to consider the other points mentioned below. 

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1d, as well as the text in the manuscript referring to this figure, would benefit from indicating specific cell lines used for UM. The same for the sentence in line 153. 

      We thank the reviewer for bringing this to our attention. We have added the cell line names and updated the manuscript accordingly.

      For any of the studies conducted, is there any link with the genetics of UM? E.g. BAP1 wildtype/BAP1 mutant? 

      As addressed above in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Row 191 - How were peaks classified as enhancer-occupied? 

      We used annotatePeaks function of HOMER package to annotate genomic locations, as well as H3K27ac ChIP-seq to annotate peaks as enhancer-occupied. We thank the reviewer to pointing it out and have updated the manuscript accordingly to include this information.

      Row 259, the two cell lines should be named, also in Figure 3i. 

      We have added the cell line names and updated the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      As a proof of concept, this study is truly excellent and the authors should be commended. However, it is desirable that new knowledge in cancer is translated to the clinic. To this end there are a few things needed to strengthen the study. 

      I am rephrasing my statements from the public review to say that I would recommend testing the inhibitor in T cells (side effects) and BAP1 mutant cell lines (for clinical relevance). 

      As addressed in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Regarding concerns for any potential side effect on T cells, we observed an increase in both CD4 and CD8 T-cell populations in the peripheral blood and the spleen, when naïve, non-tumor bearing CD-1 mice were dosed with SMARCA2/4 dual ATPase inhibitor FHD-286 once daily for 14 days. FHD-286 is a compound similar to FHT-1015 described in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/). In addition, FHD-286 has been tested in tumor bearing syngeneic models. When B16F10 tumor bearing C57BL/6 were dosed with FHD-286 for 10 days, we observed an increase in CD69+ activated CD8 T-cell infiltration in the tumor microenvironment (doi:10.1136/jitc-2022-SITC2022.0888).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Determine drug binding by crystal structure or generate additional SMARCA4 or SMARCA2 mutations in the region near I1173/I1143 that are not conserved in CHD4 and test them in an ATPase assay for effects on drug inhibition. For example, Q1166 in SMARCA4 and Q1136 in SMARCA4 could be changed to Alanine as in CHD4. Would this abrogate drug inhibition? 

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      (2) The finding that SOX10 can rescue the antiproliferative effects of FHT1015 suggests that SMARCA4 is primarily needed for SOX10 expression. However, the co-occupancy of SMARCA4 and SOX10 at enhancers suggests that they cooperate to promote chromatin accessibility. It is unclear how over-expression of SOX10 can promote chromatin accessibility in drug-inhibited cells since SOX10 does not have chromatin remodeling activity. ATAC-seq in cells over-expressing SOX10 and treated with the drug could identify SOX10-dependent targets that do not require SMARCA4 activity and clarify the mechanism. It would also be informative to determine if SOX10 over-expression abrogates the effects of FHT1015 on both cell cycle and apoptosis, helping to resolve whether it is a partial or complete rescue of proliferation. 

      We agree that running ATAC-seq in cells overexpressing SOX10 would clarify this mechanism. However, shifts in corporate strategy deprioritized any further experiments for this project. One potential mechanism that SOX10 overexpression can partially rescue BAF inhibition phenotype is through overexpressed SOX10 localizing to open chromatin regions (mostly promoters) across the genome. We know from our ATAC-seq data (Fig. 2) that BAF inhibition leads to loss of chromatin accessibility at SOX10 enhancer sites, while promoter regions are only partially affected. Therefore, we think that overexpression of SOX10 would allow upregulation of its target genes via binding to the promoter regions. In this model, the enhancer-driven SOX10 target genes are likely to remain silenced.  

      (3) Although the in vivo studies indicate that the drugs are well-tolerated, additional in vitro studies to determine the effects of the drug on the proliferation/survival of non-cancerous cells would further validate their therapeutic utility.

      Author Response: The reviewer raises a critical question. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017), and it was well tolerated at continuous daily dose of up to 7.5 mg QD and at intermittent dose of up to 17.5 mg QD.  Manuscript describing results of this clinical trial is currently in preparation.

    1. Author response:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth.  Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number.  It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler.  I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c<sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct.  The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values.  A simple way to determine this number is to have the simulation code print the value to which c<sub>max</sub>  is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values.  In another section of this response I will describe how to do this with the simulation code written and used by Siljestam and Rueffler; doing so confirms the value that I obtained with my own code.  Furthermore, I will now give a theoretical derivation of this factor.

      As specified by Siljestam and Rueffler, the positions of the m pathogens in (m-1)-dimensional antigenic space correspond to the vertices of a regular simplex centered at the origin, with distance between vertices equal to 1.  The squared distance from the origin to each of the m vertices of such a simplex is (m-1)/2m (https://polytope.miraheze.org/wiki/Simplex).  Thus, the sum of the m squared distances is (m-1)/2.  For the (0, 0) homozygote, condition is multiplied by a factor of exp(-(vr)<sup>2</sup>/2) for each pathogen, where r is the distance from the origin.  It follows that, with v=20, all the pathogens together decrease condition by a factor of exp(20<sup>2</sup>∙(m-1)/4) = exp(100∙(m-1)).  Thus, increasing or decreasing m by 1 changes this value by a factor of exp(100) = 2.7∙10<sup>43</sup>.

      This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      That shows only that the results are not extremely sensitive to c<sub>max</sub> or K.  They are, nonetheless, exquisitely sensitive to m and v.  This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c<sub>max</sub> a change large enough to have a major effect on the results.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v=20.  As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v.  This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions.  Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable.  I will therefore describe how my conclusions about sensitivity to parameter values can be verified using the simulation code provided by Siljestam and Rueffler themselves, with only small, easily understood modifications.  I will consider adding this description as a supplement when I revise the manuscript.

      The starting point is the Matlab file MHC_sim_Dryad.m, available at https://doi.org/10.5061/dryad.69p8cz98j.  First, we can add a line that prints the value of the variable logcmax, which represents the natural logarithm of cmax determined and used by the code.  Below line 116 (‘prework’), add the line ‘logcmax’ (with no semicolon).

      Now, at the Matlab prompt, execute MHC_sim_Dryad(false, 8, 20, 1) to run the simulation for the Gaussian model with m=8, v=20, and K=1.  The output will indicate that logcmax=700, in accord with the theoretical factor exp(100*(m-1)) derived above.  The allelic diversity, n<sub>e</sub>, will rise to a steady state-level of about 140, as in the red curve of my Fig. 2.

      Now lower m to 7, i.e,  run MHC_sim_Dryad(false, 7, 20, 1).  The output will indicate that logcmax=600.  This confirms that lowering m by 1 causes the code to lower the value of c<sub>max</sub> by a factor exp(100)=2.7∙10<sup>43</sup>, which must also be the factor by which the condition of the most fit homozygote would increase without this adjustment.

      With the change of m to 7 and the compensatory change in c<sub>max</sub>, steady-state allelic diversity remains high.  But what if m changes but c<sub>max</sub> remains the same, as it would in reality?

      To find out, we can fix the value of c<sub>max</sub> to the value used with m=8 by adding the following line below the line previously added: ‘logcmax = 700’.  With this additional modification in place, executing MHC_sim_Dryad(false, 7, 20, 1) confirms that without a compensatory change to c<sub>max</sub>, lowering m from 8 to 7 mostly eliminates allelic diversity, in accord with the corresponding curve in my Fig. 2.  Similarly, raising m from 8 to 9, or changing v from 20 to 19.5 or 20.5 (executing MHC_sim_Dryad(false, 8, 19.5, 1) or MHC_sim_Dryad(false, 8, 20.5, 1)), largely eliminates diversity, confirming the other results in my Fig. 2.  Results for the bitstring model can also be confirmed, though this requires additional changes to the code.

      Thus, the extreme sensitivity of the results of Siljestam and Rueffler to parameter values can be verified with the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”.

      Response to Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem.  However, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c<sub>max</sub>.  Rather, they describe the adjustment of c<sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”.  Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>).  In this sense there is no loss of generality, but the automatic adjustment of c<sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I had hoped that the final paragraph of the Discussion would make the basis for the title clear.  I will consider whether this can be clarified in a revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Comments on revised version: 

      I have reviewed the revised manuscript and read the rebuttal. The authors have carefully addressed my concerns. There is however one point that needs further work: 

      This follows up on my major point #1 in my initial review. I had I asked the authors to demonstrate that FOLFIRI + AZD are less toxic to untransformed colorectal cells than colorectal cancer cell lines.  It is good to see that the authors took my advice and show effects of the drug treatments on the untransformed colorectal cell line CCD841. It seems to be less sensitive to AZD and FOLFIRI in the figure in the rebuttal. What surprises me is that I cannot find these new figures anywhere in the revised manuscript. Also, the data seem preliminary, because I do not see any standard errors in the graphs, and I cannot find a description of the time of drug incubation. I ask the authors to make sure that the CCD841 data are reproducible, and make sure they incorporate the data in the revised manuscript. 

      We thank the reviewer for this insightful comment. In the initial revised version of the manuscript, we did not include results from the untransformed colorectal cell line CCD841, as those experiments had only been performed once and were considered preliminary. However, we fully agree with the reviewer on the importance of including these data.

      To address this, we have repeated the experiments in CCD841 cells to ensure reproducibility. We now report the results from three independent experiments testing the combination of AZD2858 and FOLFIRI on healthy epithelial colon cells. These results are shown in Supplementary Figure S7, where blue matrices represent cell viability and black matrices reflect the level of synergy between AZD2858 and FOLFIRI.

      Our results confirm that, individually, each drug has little to no effect on healthy cells, and no consistent synergistic interaction was observed, except in Experiment 1, which could not be reproduced. Importantly, the drug concentrations used were identical to those applied in the cancer cell experiments, allowing for direct comparison between normal and malignant cell responses.

      Reviewer #2:

      Comments on latest version: 

      Morano et al. have revised their manuscript in response to the points raised by reviewer #3 as follows.

      (1) Fig. 2E: Correcting the previously erroneous labelling of this Fig. makes it match the textual description. 

      (2) Figs 3A and B: The revised textual description of the flow cytometry BrdU incorporation is now precise. 

      (3) Fig. 3E: Removing the suspect WB images is a pragmatic decision that does not significantly affect the overall conclusions of the paper. 

      (4) Fig. 3D: Despite its puzzling appearance this data is now described accurately in the text as "DSBs remained elevated after the combined treatment" rather than "increased after the combined treatment. A more convincing increase in the presumed damaged DNA band is evident in Fig. 4D when AZD2858 is combined with a much lower concentration of SN38 (1.5nM) which could mean that the concentration used in Fig. 3D (300nM) induced maximal damage that could not be further enhanced. 

      We thank the reviewer for their thoughtful comments and constructive feedback, which have helped us improve the clarity and rigor of the manuscript.

      Reviewer #3:

      Comments on latest version: 

      The authors have addressed most of the concerns that I raised in the first round of revision and I have no further questions. I appreciate the authors's efforts in carrying out an preliminary in vivo work, although as the authors pointed out the compound seems to be not effective in vivo. Future work is desired to address this to clarify the significance of the work. 

      We thank the reviewer for acknowledging our efforts in addressing the previous concerns. We also appreciate the recognition of our preliminary in vivo work. While these results suggest limited in vivo efficacy of the compound at this stage, we agree that additional studies will be necessary to fully evaluate its therapeutic relevance. We consider this an important next step and are committed to pursuing it in future work.

    1. Author response:

      General Statements

      In this paper we demonstrate that the lipid packing of the plasma membrane has a huge impact on the stability of caveolae. By using interdisciplinary techniques, we show that the widely used dynamin inhibitor Dyngo-4a adsorbs and inserts to lipid bilayers leading to a decreased lipid packing and hence reduced caveolae dynamics and internalization even in cells lacking dynamin. We have added experiments that validates that Dyngo-4a treatment does not result in fragmentation or disassembly of the caveolae.  A FRAP assay of cytosolic caveolae has been employed to address questions concerning scission. Moreover, as suggested by the reviewers, we have also included new simulation data that show and expand on the fact that Dyngo-4a positions in the lipid leaflet similar to cholesterol and preferentially associates with cholesterol clusters, affecting the spatial distribution of cholesterol in the membrane. We believe that these added data have greatly improved the paper and strengthened our conclusions that the lipid packing is a critical determinant in the balance between internalization and stable plasma membrane association of membrane vesicles.

      As requested, we have expanded the introduction to provide more detailed information about previous findings in the field. Changes and addition to the text has been highlighted in red for easier tracking.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors use Dyngo-4a, a known Dynami inhibitor to test its influence on caveolar assembly and surface mobility. They investigate, whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy.

      Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assays is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      We thank the reviewer for the nice constructive comments, and we very much appreciate the positive critique. We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. In addition, we are currently preforming CTxB HRP experiments to quantify the number of caveolae at PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Reviewer #1 (Significance):

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript uses the small molecule dynamin inhibitors dynasore and dyngo to show that in dynamin triple knockout cells that these inhibitors impact lipid packing and organization in the plasma membrane. Data showing that dyngo affects caveolin dynamics using tirf microscopy is also shown and is interpreted to reflect inhibition of caveolae scission from the membrane.

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion that dyngo is preventing caveolae scission from the membrane. Study of caveolae endocytosis is based on a TIRF assay that has inherent limitations:

      - Caveolae are defined as bright cav1-positive spots in diffraction limited TIRF and their disappearance presumed to be endocytic events. Cav1 spots are presumed to be caveolae but the authors do not consider that they may be flat non-caveolar oligomers. The diffraction limited TIRF approach interprets the large structures as caveolae but evidence to that effect is lacking.

      This is a valid comment and to address this we have now included data showing colocalization of cavin1 and EHD2 to the Cav1-GFP spots. We can however not determine if they are flat or invaginated. We do have extensive experience imaging caveolae using TIRF microscopy and carefully chose cells that display low expression of fluorescently labelled caveolin to avoid non-caveolar structures.

      - The analysis (and the diagram presented in figure 4) considers that caveolae can either diffuse laterally in the membrane or internalize and does not consider that caveolae can flatten and possibly fragment in the membrane. Is it not possible that loss of Cav1 spots is a fragmentation event and not necessarily a scission event?

      This is a good question, yet, fragmentation and disassembly would result in shorter track durations and this is not what is observed in data. We have now also included data showing that cavin1 is persistently associated with the Cav1 spots identified as caveolae during Dyngo-4a treatment indicating that these are caveolae. Furthermore, IF stainings showing colocalization of Cav1GFP with cavin1 or EHD2 after Dyngo-4a treatment have also been added. We have now also expanded on the different interpretations of the data in the results section.

      - The analysis is based on overexpression of Cav1-GFP that may alter the stoichiometry between Cav1 and cavin1 such that while caveolae may be expressed, larger non-caveolar structures may accumulate.

      Yes, this is correct, we have specifically imaged cell expressing low levels of Cav1-GFP to avoid accumulated non-caveolar structures that can be spotted in cells with high expression.

      - Cav1 has been shown to be internalized via the CLIC pathway (Chaudary et al, 2014) and if dyngo is impacting clathrin then maybe it is also impacting CLIC endocytosis and thereby Cav1 endocytosis via this pathway?

      Dyngo-4a has been shown to not affect CLIC endocytosis (McCluskey et al., 2013) and in our data we do not see internalization following Dyngo-4a treatment.

      - The longer Cav1 TIRF track time and shorter displacement with dyngo is consistent with inhibition of caveolae scission. However, as the authors discuss, could not reduced membrane undulations due to dyngo's impact on membrane order be responsible for the longer tracks? Alternatively, perhaps the altered lipid packing is corralling Cav1 movement and reducing non-caveolar Cav1 endocytosis, resulting in shorter tracks of longer duration? The proposed interaction of dyngo with cholesterol could prevent scission but also stabilize large (flat?) Cav1 oligomers in the membrane, perhaps reducing Cav1 oligomer fragmentation.

      We completely agree that membrane undulations contribute to instability of the TIRF-field and therefore disruption of cav1-GFP tracks as we discuss in the results section and have been described in previous work (Larsson et al., 2023). Yet, we have also shown that internalization of caveolae results in shorter tracks (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015). Furthermore, the tracked Cav1-GFP spots are persistently positive for cavin1 both with and without Dyngo-4a treatment showing that the majority do not disassemble become internalized by other pathways. Additionally, the added IF stainings after 30 min Dyngo-4a treatment also show that the Cav1-GFP spots remain positive for cavin1 and EHD2 just as ctrl-treated cells.

      My point here is not to discredit the data but only to suggest that the TIRF approach used is an indirect measure of caveolae scission from the membrane that requires substantiation using other approaches.

      We appreciate these comments and have tried to address these by adding new data and discussions on the interpretation of the tracking data in the results section.

      Dyngo is certainly generally affecting lipid packing via cholesterol and thereby affecting Cav1 dynamics in the plasma membrane. The claim of caveolae scission should be qualified and alternative possibilities considered and discussed. If the authors persist in arguing that dyngo is affecting caveolae scission then the effect should be substantiated by accumulation of caveolae by quantitative EM and high spatial and temporal resolution imaging of Cav1 and cavin1 to define the endocytic events. As the latter represents a new, and potentially very challenging, line of experimentation, I would suggest that it is beyond the scope of the current study. As indicated above the additional experiments are not necessary and qualification of the claims would be sufficient.

      We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. We are also currently preforming CTxB HRP experiments to quantify the number of caveolae at the PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Other points

      Figure 1C - Cav1 positive spots cannot be interpreted to be caveolae from diffraction limited confocal images. Same comment applies to Fig 4G - caveola? duration.

      We completely agree with this and that the claims should be qualified. We have added IF stainings showing that the Cav1-GFP structures are also positive for cavin1. We have now clarified that we cannot distinguish between flat or different curved states of caveolae using this methodology. We have also changed the labelling of Fig. 4G.

      Figure 4C - it is not clear why this EM data is not quantified - for both the number of caveolae and clathrin coated pits - as this would help clarify the interpretation of the effect reported.

      We are currently preforming CTxB HRP experiments to quantify the number of caveolae using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Figure 4D - the AFM experiments should perhaps be repeated as the non-significant effect of dyngo on the Young's modulus may be a result of insufficient n values.

      We would like to clarify that to ensure the robustness of our AFM measurements, we performed the experiments with sufficient biological and technical replicates. Specifically, each data point shown in Figure 4D represents a Young’s modulus value averaged from approximately sixty force-distance curves per cell. For each condition, we collected force-distance maps on eight to nine individual cells, obtained from two separate petri dishes per day. We repeated this process on two independent days. In total, we analysed thirty-one cells for the DMSO control and thirty-three cells for the Dyngo-4a treatment. We performed the “student’s t-test with Welch’s correction” to access the statistical significance between the two conditions, as described in the main text. We believe that the sample size and statistical approach are sufficient to support the conclusions presented. Furthermore, we also analysed cell stiffness by calculating the slope of the linear portion of the force-distance curves. This analysis also did not reveal any statistically significant differences between the conditions (data not shown), further supporting our conclusion that Dyngo-4a treatment does not significantly alter the Young’s modulus under our experimental setup (or conditions).

      Reviewer #2 (Significance):

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion is that dyngo is preventing caveolae scission from the membrane.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Larsson et al present experimental and computational data on the role of Dyngo4a (a compound that was developed to inhibit dynamin) on the dynamics of caveolae. The manuscript mostly documents effects of Dyngo on caveolae, with one experiment to suggest a mechanism for this result. This one rather unconvincing result forms the focus of the manuscript contributing to a disconnect between the data and the presentation. Additionally, there are concerns with data interpretation. The writing could also benefit from revision to address grammar mistakes, strengthen referencing, and increase precision. Overall, the manuscript requires substantial revisions before being considered for publication. The central claim, in particular, needs stronger evidence to support the proposed mechanism.

      We thank the reviewer for the thorough review and for experimental suggestions that we believe has strengthened our data further.

      Significant issues (in approximate order of importance):

      (1) The data supporting the central mechanistic explanation appears limited. There is no evidence that Dyngo remains in one leaflet

      The simulations show that the energy barrier for moving in between bilayers is very high. Furthermore, simulations of C-Laurdan has shown that it does not readily flip in between membrane leaflets (Barucha-Kraszewska et al., 2013) supporting that it reports on the outer lipid leaflet when added to cells. We have however now changed this and state that Dyngo-4a decreased the lipid order in the plasma membrane.

      - the GP of the PM is very low compared to previous measurements,

      The absolute GP-values will vary between setups depending on what filters are used so they are not comparable between laboratories. What is of importance is that we found a significant change in the relative GP-values in cells treated with Dyngo-4a and control cells. It is this change that we report. We have not performed any GP-measurements on this cell type earlier so it is unclear what previous measurements reviewer #3 are referring to.

      - effects on other membranes are not explored,

      The order of the intracellular membranes is as expected lower than that of the plasma membrane. Differentiating different intracellular membranes of interest like endocytotic vesicles from other intracellular membranes would be very difficult but, more importantly, our study is focused on what is happening in the plasma membrane where caveolae reside and would be of minor interest for plasma membrane dynamics.

      - dynamin-directed effects of Dyngo are not considered,

      In the discussion section we discuss the difficulties with disentangling dynamin-direct and indirect effects.

      (2) The QCM-D measurements and claims require explanation as several aspects remains unclear. In Fig S2, the 'softness' (what does this mean?) changes by 4-fold with DMSO alone (what does this mean?), then fractionally more with Dyngo. Then fractionally more again when Dyngo is removed (why?). Then it remains somewhat higher when both Dyngo and DMSO are removed, which is somehow interpreted as Dyngo remaining in the bilayer, but not DMSO.

      We understand the confusion of the reviewer and hope our explanations provide clarity. QCM-D measurements are based on an oscillating quartz crystal sensor. Specifically, alterations in oscillation frequency (ΔF) and the rate of energy dissipation from the sensor surface (ΔD) are what is measured. Allowing the measurement of: 1) materials adsorbing to the sensor surface, 2) changes in the viscoelastic properties of a solution in contact with the sensor surface, 3) changes in the material adsorbed to the sensor surface upone exposure to different solutions. The ratio of ΔD/-ΔF reports the mechanical softness or rigidity of an adsorbed material, in this case the SLB.

      A “buffer shift” is the term used when there is not an adsorption to the sensor surface, but rather an effect from altering the solution above the sensor surface. One reason is because different solutions can have different densities (e.g., a DMSO-buffer mixture vs buffer alone), which impacts the oscillations of the sensor. It was observed that the DMSO-buffer mixture alone gave a large buffer shift in comparison to the adsorption of the Dyngo-4a into the SLB, thereby muddling the data interpretation. Thus, in Fig. S2 the system was first equilibrated with the DMSO-buffer mixture prior to addition of the Dyngo-4a solution to allow for clearer visualization of the two events. In QCMD to assess if something has made a permeant change to the system you change back to the solutions used before the addition, thus first we washed with a DMSO-Buffer mixture followed by buffer alone. Control experiments were carried out in which no Dyngo-4a was added (also shown in Fig. S2). The control shows the same “buffer shift” from the DMSO-buffer mixture occurs in both systems and that upon returning to a buffer only condition there is no permanent change to the system caused from exposure to the DMSO. In contrast, once the system that received Dyngo-4a is changes back to a buffer only system we see that mass has been added to the system (ΔF) with little change to the dissipation (ΔD), thereby resulting in a lower ratio of ΔD/-ΔF, which is to say that the SLB after the adsorption of Dyngo-4a was more rigid that the SLB without Dyngo-4a.

      These interpretations are difficult to grasp, as the authors seem to be implying simple amphiphilic partitioning into the membrane, which should all be removable by efficient washing.

      Amphiphilic partitioning is not fully reversible by “efficient washing” it depends on partitioning coefficients.

      I do not doubt that this compound interacts with membranes, but the quantifications appear ambiguous. A bilayer with 16 mol% (or worse, 30% if all in one leaflet) Dyngo is very unlikely (to remain a bilayer). Even if such a bilayer was conceivable, the authors are claiming an ADDITION of Dyngo that would INCREASE the area of one leaflet by 30%, which needs explanation as it appears unlikely.

      We understand that in our attempt provide numbers in the results section for the amount of binding observed in QCM-D, this can easily be interpreted as this is what is observed to insert into the PM. However, as discussed in the discussion, we also see aggregations of Dyngo-4a that associate with the membrane in the simulations which likely could contribute to the binding observed in QCM-D prior to washing. The precise amount of membrane inserted Dyngo-4a is difficult to measure as we discuss in the text. In order to make this clearer, we have now moved all these details to the discussion section where we elaborate on this. Furthermore, since Dyngo-4a, like cholesterol, is intercalating in between the head groups of the lipids the area would not increase in direct proportion to the mol%.

      Also, there are no replicates shown, so unclear how reproducible these effects are?

      For clarity, only single experiments are shown. However, multiple experiments were performed and the range in measured values for 3 technical repeats can be observed in the standard deviations found in the main text (e.g., 6 ± 2 mol%).

      (3) The simulations are insufficiently described and difficult to interpret. How big are these systems? Why do the figures show the aqueous system with lateral boundaries?

      There are no explicit boundaries used in the simulations, periodic boundary conditions are applied in all three dimensions. The lateral boundaries observed in the figures correspond to the simulation box edges and are a visual artifact of 2D projections with QuickSurf representation. No artificial wall or constraints were introduced laterally. Additional technical details, including the system size and periodic boundary conditions have now been added to the methods section.

      It seems quite important that multiple Dyngo molecules aggregate rather than partition into membranes - is this likely to occur in experiment?

      Yes, this is important and with the additional simulation experiments suggested by Reviewer #3 it has been clarified that they contribute a great deal to the change in lipid packing of lipid bilayers containing cholesterol.  However, it is hard to test aggregation is the cellular system, but we believe that this happens and contribute to the effect on membranes. We have now emphasized the effect of the aggregates in the text.

      PMF simulations are strongly suggesting that Dyngo does not spontaneously cross membranes, which is inconsistent with its drug-like amphiphilicity (cLogP~2.5 is optimally suited for membrane permeation) and known effects on intracellular proteins. This suggests an artefact in these PMFs.

      As stated in the submitted version of the manuscript, logP was used to validate the topology and the observed value was in a very good agreement with cLogP. Moreover, this validation complemented the standard procedure of CHARMM-GUI ligand modelling, that provided a reasonable penalty score (around 20) for the Dyngo-4a topology. POPC and cholesterol molecules are standard in the force field and validated by numerous studies. The parameters used for the membrane simulations and AWH in particular are very common for this type of studies. Thus, we do not see what may cause any artifacts in the free energy profile construction. In fact, amphiphilicity of the molecule may be one of the key reasons that Dyngo-4a molecule remains at the aqueous interface of the membrane and does not cross the membrane spontaneously. Also, we believe that the energy barrier of 40-60 kJ/mol is not prohibitively high and Dyngo-4a molecules may still overcome the barrier eventually, though we expect majority to reside in the upper leaflet.

      The authors should experimentally measure the permeation of Dyngo through bilayers (or lack thereof), to more robustly support their finding that Dyngo does not cross membranes spontaneously.

      We thank the reviewer for the suggestion, however this if very technically challenging and would require establishment of precise systems which is beyond the scope of this manuscript.

      (4) Why not measure effect of Dyngo on lipid packing directly and more broadly in model membranes?

      With the added modelling experiments supporting the previous simulations and the calculated GP values from the C-Laurdan experiments on cellular plasma membrane, we do not find it necessary to include more model membranes experiments than the already existing ones on lipid monolayers and supported lipid bilayers.

      (5) Statistics should not be done on individual cells (n>26), but rather on independent experiment (N=3?)

      We have performed the statistics on live cell particle tracking according to previous literature on similar systems (Boucrot et al., 2011; Larsson et al., 2023; Shvets et al., 2015; Stoeber et al., 2012).

      (6) Fig 1G is important but rather unclear. Firstly, these kymographs are an odd way to show that the caveolae are not moving. More importantly, caveolae in normal cells have been shown to be quite stable and immobile (eg doi: 10.1074/jbc.M117.791400), yet here they are claimed to be very mobile.

      Although this might be an odd and unconventional way to depict dynamic processes, we believe that this is a very illustrative way to show track stability over time in bulk rather than just a kymograph over a few structures in a cell. Furthermore, we are not claiming that caveolae are very mobile but rather the opposite very stable in agreement with previous work (Boucrot et al., 2011; Larsson et al., 2023; Mohan et al., 2015). We have now edited the text to make this even clearer.

      Also, if Dyngo prevents caveolae scission, there should be more of them at the membrane - why no quantification like Fig 1C to show accumulation of caveolae upon Dyngo treatment? Or directly counting caveolae via EM, as in Fig 4C?

      We are currently preforming CTxB HRP experiments using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long. However, Dynasore has previously been shown, by EM, to increase the number of caveolae at the PM (Moren et al., 2012; Sinha et al., 2011).

      (7) The writing can be made more precise and referencing could be strengthened.

      The introduction was written in a short format, and we have now extended this and made it more precise.

      Some examples:

      (a) 'scissoned' is not a word in English,

      Thanks, we have now changed this.

      (b) what is meant by "Cav1 assembly is driven by high chol content"? There are many types of caveolin assemblies.

      We agree that this can be made more precise and have now clarified this in the introduction.

      (c) "This generates a unique membrane domain with distinct lipid packing and a very high curvature." Unclear what 'this' refers to and there is no reference here, so what is the evidence for either of these claims? Caveolin-8S oligomers are not curved. Perhaps 'this' is caveolae, but they are relatively large and also not very highly curved and I am unaware of measurements of lipid packing therein.

      Caveolae are around 50 nm which in biology is a very high curvature of a membrane. It has been extensively proven that caveolae have a distinct lipid composition highly enriched in cholesterol and sphingolipids, which thereby also will generate a unique lipid packing as compared to the surrounding membrane. Yet, the reviewer is correct that lipid packing has not been measured in a caveola for obvious technical challenges. Thus, we have now changed the text to “special lipid composition”.

      The sentence following that one again makes a specific, but unreferenced, claim.

      (d) intro claims that lipid packing is critical for fission, but it is unclear quite what is meant by this claim. The references do not help, as they are often about the basic biophysics of lipids, rather than how packing affects fission.

      We have now edited the text.  

      (e) intro strongly implies that caveolae remain membrane attached because of stalled scission. How strong is the evidence for this? The fact that EHD2 is at the neck is not definitive,

      We used the term stalled scission to describe that all omega shaped membrane invaginations do not scission in the same automatic way as clathrin coated vesicles. We have now changed this in the text. Caveolae are shown to be released (undergo scission) and be detected as internal caveolae if the protein EHD2 is removed. Hence this must be interpreted as if EHD2 stalls scission. The evidence includes data compiled over the last 12 years from others and us which include for example: 1) Caveolae with EHD2 have a longer duration time (Larsson et al., 2023; Mohan et al., 2015; Moren et al., 2012; Stoeber et al., 2012), Knock down of EHD2 results in more internalized caveolae as measured by CTxB HRP using EM (Moren et al., 2012) and shorter duration time at the PM (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015; Stoeber et al., 2012). 2) EHD2 overexpression results in less internalized caveolae as measured by CTxB HRP using EM (Stoeber et al., 2012). Furthermore, 3) overexpression or acute addition of purified EHD2 via microinjection counteracts lipid induced scission of caveolae and hence result in caveolae stabilization at the PM (Hubert et al., 2020). It is very hard to see that the release and internalization of caveolae could result from anything else than that these have undergone scission. EHD2 has been found around the rim of caveolae (Matthaeus et al., 2022) and overexpression of EHD2 oligomerizing mutants have been shown to expand the caveola neck (Hoernke et al., 2017; Larsson et al., 2023).

      (f) unclear what is meant by 'lipid packing frustration' and how Dyngo supposedly induces it.

      Lipid packing frustration refers to what is usually referred to as lipid packing defect, but since lipid membranes are describe as a fluid system it should not have defects whereby, we believe that lipid packing frustration is more accurate. However, we have now changed the text and use “decreased lipid packing” or “decreased lipid order” more thoroughly to describe the effect on the plasma membrane.

      (8) IF of Cav1 is insufficient to claim puncta as caveolae. Co-stained puncta of caveolin with cavin are much stronger evidence. Same issue for Cav1-GFP puncta.

      We agree and have now provided IF showing cavin1 and EHD2 colocalization to Cav1GFP in non and Dyngo-4a-treated cells.

      (9) Fig 3E claims that "preferred position of Dyngo-4a was closer to the head groups" but the minimum looks to be in similar place as Fig 3B without cholesterol. Response:

      We appreciate the reviewer’s observation. The PMF minima in the POPC and POPC:Chol membranes are indeed close in absolute position (~1.1–1.2 nm from the bilayer center). However, as clarified in the revised text, the presence of cholesterol leads to a slight shift of Dyngo-4a closer to the headgroup region and broadens the positional distribution. This is also evident from the added density profiles (Fig. S3A) and is now described more precisely in the manuscript.

      Critically, these results do not support the notion that Dyngo affects lipid packing sufficiently, which is not measured in the simulations (though could be).

      We thank the reviewer for the excellent suggestion. In response, we have now included a detailed analysis of Dyngo-4a’s effect on lipid packing in the simulations. As described in the revised manuscript, we measured deuterium order parameters, area per lipid (APL), and lipid–Dyngo–cholesterol spatial distributions (Figs. 3-H, S3C-E). The results demonstrate that Dyngo-4a decreases lipid order in POPC:Chol membranes. Both single molecules and clusters reduce the order parameter by up to 0.04 units, particularly in the upper leaflet, where Dyngo-4a reside.The reduction is most pronounced in the midchain region of the sn1 tail and around the double bond of the sn2 tail. These effects were accompanied by increased APL in POPC:Chol membranes and by colocalization of Dyngo-4a near cholesterol-rich regions. Together, these data confirm that Dyngo-4a perturbs membrane organization and lipid packing in a composition-dependent manner. We believe these additions directly address the concern and demonstrate that the simulations indeed support the conclusion that Dyngo-4a modulates lipid packing.

      Finally, the simulation data do not show "that Dyngo-4a is competing with cholesterol"; it is unclear what 'competition' means in this context, but regardless, the data only shows that Dyngo sits at a similar location as cholesterol.

      We agree with the reviewer that “competition” was an imprecise term. We have rephrased the relevant sections to clarify that Dyngo-4a and cholesterol localize to overlapping regions and exhibit spatial coordination. As now stated in the manuscript, cholesterol appears to partially displace Dyngo-4a from its preferred depth seen in pure POPC, broadens its membrane distribution, and alters lipid packing. According to the order parameters there is an interplay between chol and Dyngo-4a and the heatmaps show that the distribution of chol in the membrane gets less uniform in the presence of Dyngo-4a. These interactions suggest that Dyngo-4a perturbs cholesterol-rich domains.

      As new analysis routines were added to the study, we have now also added the details on those to the Methods section of the text.

      (10) AFM measures the stiffness of the cell (as correctly explained in Results section) not "overall stiffness of the PM" as stated in the Discussion.

      We thank the reviewer for pointing this out, we have now altered this in the discussion section.

      (11) Fig2A: what was the starting lipid surface pressure? How does Dyngo insertion depend on initial lipid packing?

      The starting pressure lipid pressure was 20 mN m<sup>-1</sup which we now have incorporated in the figure legend. We performed several such experiments with a starting pressure ranging from 20-23 mN m<sup>-1</sup> showing consistent results which we described in the materials and methods section. Given that we also performed QCMD analysis and simulations on bilayers showing that Dyngo-4a adsorbed and inserted respectively, we have not performed a titration of starting pressures resulting in a MIP of Dygo-4a.

      (12) Fig 4B is a strange approach to measure membrane motion. Why not RMSD or some other displacement based method? As its shown, it implies that the area of the cell changes.

      The method that we used to quantify the area of the cell which is attached (or close to) the glass and thereby is visible in TIRF microscopy. This is area indeed changes over time which has been frequently observed and used to describe and quantify the mobility, lamellipodia and filopodia formation among other things. We agree that RMSD can also be used to analyze the data before and after treatments and we have now included RMSD­­­­ analysis in the manuscript.

      Reviewer #3 (Significance):

      The title, abstract, and introduction of the manuscript are largely framed around lipid packing, but most of the data investigate other unexpected effects of treating cells with Dyngo4a. The only measurement for lipid packing (or any other membrane properties) is Fig 4E-F. Therefore, this paper is effectively an investigation of an artefact of a common reagent, which itself could be a valuable contribution. However, the mechanism to explain its effect requires stronger evidence, and its broad biological significance needs further exploration.

      Overall, the impact of documenting the effects of Dyngo4a on membranes appears modest but may be valuable to the membrane trafficking community.

      Barucha-Kraszewska, J., S. Kraszewski, and C. Ramseyer. 2013. Will C-Laurdan dethrone Laurdan in fluorescent solvent relaxation techniques for lipid membrane studies? Langmuir. 29:1174-1182.

      Boucrot, E., M.T. Howes, T. Kirchhausen, and R.G. Parton. 2011. Redistribution of caveolae during mitosis. J Cell Sci. 124:1965-1972.

      Hoernke, M., J. Mohan, E. Larsson, J. Blomberg, D. Kahra, S. Westenhoff, C. Schwieger, and R. Lundmark. 2017. EHD2 restrains dynamics of caveolae by an ATP-dependent, membrane-bound, open conformation. Proc Natl Acad Sci U S A. 114:E4360-E4369.

      Hubert, M., E. Larsson, N.V.G. Vegesna, M. Ahnlund, A.I. Johansson, L.W. Moodie, and R. Lundmark. 2020. Lipid accumulation controls the balance between surface connection and scission of caveolae. Elife. 9.

      Larsson, E., B. Moren, K.A. McMahon, R.G. Parton, and R. Lundmark. 2023. Dynamin2 functions as an accessory protein to reduce the rate of caveola internalization. J Cell Biol. 222.

      Matthaeus, C., K.A. Sochacki, A.M. Dickey, D. Puchkov, V. Haucke, M. Lehmann, and J.W. Taraska. 2022. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 13:7234.

      McCluskey, A., J.A. Daniel, G. Hadzic, N. Chau, E.L. Clayton, A. Mariana, A. Whiting, N.N. Gorgani, J. Lloyd, A. Quan, L. Moshkanbaryans, S. Krishnan, S. Perera, M. Chircop, L. von Kleist, A.B. McGeachie, M.T. Howes, R.G. Parton, M. Campbell, J.A. Sakoff, X. Wang, J.Y. Sun, M.J. Robertson, F.M. Deane, T.H. Nguyen, F.A. Meunier, M.A. Cousin, and P.J. Robinson. 2013. Building a better dynasore: the dyngo compounds potently inhibit dynamin and endocytosis. Traffic. 14:1272-1289.

      Mohan, J., B. Moren, E. Larsson, M.R. Holst, and R. Lundmark. 2015. Cavin3 interacts with cavin1 and caveolin1 to increase surface dynamics of caveolae. J Cell Sci. 128:979-991.

      Moren, B., C. Shah, M.T. Howes, N.L. Schieber, H.T. McMahon, R.G. Parton, O. Daumke, and R. Lundmark. 2012. EHD2 regulates caveolar dynamics via ATP-driven targeting and oligomerization. Mol Biol Cell. 23:1316-1329.

      Shvets, E., V. Bitsikas, G. Howard, C.G. Hansen, and B.J. Nichols. 2015. Dynamic caveolae exclude bulk membrane proteins and are required for sorting of excess glycosphingolipids. Nat Commun. 6:6867.

      Sinha, B., D. Koster, R. Ruez, P. Gonnord, M. Bastiani, D. Abankwa, R.V. Stan, G. Butler-Browne, B. Vedie, L. Johannes, N. Morone, R.G. Parton, G. Raposo, P. Sens, C. Lamaze, and P. Nassoy. 2011. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 144:402-413.

      Stoeber, M., I.K. Stoeck, C. Hanni, C.K. Bleck, G. Balistreri, and A. Helenius. 2012. Oligomers of the ATPase EHD2 confine caveolae to the plasma membrane through association with actin. EMBO J. 31:2350-2364.

    1. Author response:

      Reviewer #1 (Public review):

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 (Public review):

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows.

      First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed.

      Thank you for this insightful comment. We analyzed the colocalization of 3xmCherry-Atg8a and GFP-Lamp1, which label autophagic structures and lysosomes, respectively, in Vps16A RNAi fat body cells. As expected, Vps16A silencing markedly reduced the overlap between these two signals, indicating a strong block in autophagosome–lysosome fusion. Moreover, both 3xmCherry-Atg8a and GFP-Lamp1 became more perinuclearly localized compared to the control (luciferase RNAi) cells.

      It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.  

      Thank you for raising this possibility. While we cannot fully exclude that autophagosomes might be indirectly transported via tethering to lysosomes, we consider this unlikely. We believe that in Drosophila fat cells, autophagosomes and lysosomes rapidly fuse once in close proximity. Therefore, even if alternative tethering mechanisms exist, they are unlikely to permit prolonged joint trafficking without fusion.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RNAi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.  

      Thank you for this careful observation. The 3xmCherry-Atg8a reporter is well suited to identify both autophagosomes and autolysosomes, as the mCherry fluorophore is resistant to degradation in the acidic environment of autolysosomes. Notably, mCherry-Atg8a–positive autolysosomes appear larger and brighter than pre-fusion autophagosomes, which are typically smaller and dimmer, especially under fusion-deficient conditions (e.g., Figure 4). Therefore, we use these morphological differences as a proxy to distinguish between the two.

      To improve structural assignment, we incorporated endogenous Lamp1 staining (Figure 10) and a Lamp1-GFP reporter (Figure 10—figure supplement 1). Vesicles positive for mCherryAtg8a but negative for Lamp1 are considered pre-fusion autophagosomes. Structures double-positive for mCherry-Atg8a and Lamp1 represent autolysosomes, while Lamp1positive, Atg8a-negative vesicles correspond to non-autophagic lysosomes. To clarify these interpretations, we revised the Results section and explained these reporters in more detail.

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).  

      Thank you for this valuable suggestion. We initially considered using Syntaxin17 RNAi; however, our recent findings indicate that loss of Syx17 results in a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605). In this case, tethered vesicles would likely move together, confounding the interpretation of autophagosome-specific trafficking.

      Therefore, we turned to other SNAREs such as Vamp7 and Snap29. One Snap29 RNAi was located on the appropriate chromosome needed for our genetic experiments. We generated a transgenic fly line expressing both Snap29 RNAi and the mCherry-Atg8a reporter under a fat body-specific R4 promoter. When we tested our key trafficking hits in this background, we observed similar autophagosome localization phenotypes as in Vps16A RNAi cells. These results, now included in the revised manuscript (see Figure 6), confirm that the observed transport phenotypes are not specific to Vps16A or HOPS complex loss.

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosomelysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

      Thank you for this important point. While Rab7 and Epg5 indeed participate in autophagosome–lysosome tethering and fusion, our data suggest they also contribute to autophagosome movement. This is evident from the distinct phenotypes observed upon Rab7 or Epg5 RNAi compared to Vps16A or SNARE RNAi. Depletion of Vps16A, Syx17, Vamp7, or Snap29 (factors involved specifically in fusion) results in perinuclear accumulation of autophagosomes. In contrast, Rab7 or Epg5 RNAi leads to a dispersed autophagosome pattern throughout the cytoplasm.

      These differences suggest that Rab7 and Epg5 play additional roles in positioning autophagosomes. Supporting this, our co-immunoprecipitation experiments show that Epg5 interacts with dynein motors. Therefore, we propose that Rab7 and Epg5 influence both autophagosome fusion and their microtubule-based transport.

      Reviewer #2 (Public review):  

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.  

      Thank you for highlighting this limitation. We have tried to conduct time-lapse imaging of live fat body cells expressing 3xmCherry-Atg8a and GFP-Lamp1 to visualize the movement and fusion events of pre-fusion autophagosomes (3xmCherry-Atg8a positive and GFP-Lamp1 negative) and lysosomes (GFP-Lamp1 positive). Despite different experimental setups and durations of starvation, no vesicle movement was observed at all, so live imaging of larval Drosophila fat tissue will require time-consuming optimizations of in vitro culture conditions. Consistent with this, we did not find any literature data where organelle motility in fat body cells was successfully observed. Nuclear positioning in fat body cells was investigated in detail in an excellent study, however the authors were able to observe only very little movement of the nuclei by live imaging (Zheng et al. Nat Cell Biol. 2020 Mar;22(3):297-309. doi: 10.1038/s41556-020-0470-7), further highlighting the technical difficulties of live or timelapse imaging in this tissue type.

      Specific comments  

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusionrelated genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect.  

      Thank you for this insightful suggestion. We recently discovered that Syx17 depletion induces a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605), making it unsuitable for modeling autophagosome-specific fusion defects. In contrast, Vamp7 and Snap29 knockdowns do not appear to cause such tethering lock. We were able to generate a suitable Drosophila line using a Snap29 RNAi transgene located on a compatible chromosome. Upon testing key hits from our screen in this background, we found that autophagosomes redistributed similarly, supporting our conclusions. These new results have been included in the revised manuscript (see Figure 6)

      Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.  

      Thank you for this helpful suggestion. As described above, we attempted time-lapse imaging of 3xmCherry-Atg8a and GFP-Lamp1-expressing fat body cells under various conditions to identify motile pre-fusion autophagosomes. However, we did not observe any vesicle movement, regardless of the starvation duration or experimental setup. As this likely reflects technical limitations of ex vivo fat body imaging, we were unable to achieve live tracking of autophagosome dynamics without introducing perturbations. This limitation is now discussed in the revised manuscript.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by colabeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.  

      Thank you for this positive comment. We co-labeled Atg8a with the minus-end microtubule marker Khc-nod-LacZ in both shot single knockdown and shot; vps16A double knockdown cells. Ectopic Khc-nod-LacZ-positive MTOC foci were clearly visible in both conditions, and Atg8a-positive autophagosomes accumulated around these structures. These findings confirm that Shot depletion induces ectopic MTOC formation, which correlates with autophagosome relocalization. The new data have been incorporated into the revised manuscript (see Figure 1O-S).

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or codepleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.  

      Thank you for this detailed observation. We frequently observe autophagosomes accumulating in contact-free peripheral regions of dynein-depleted cells, resulting in an asymmetric distribution. While previous studies describe a radial microtubule organization in fat body cells, none of them directly label MT plus ends, the direction of kinesin-based transport.

      To further explore this, we overexpressed a HA-tagged kinesin, Klp98A-3xHA, in both control and Vps16A RNAi backgrounds. Immunolabeling revealed that Klp98A localizes to the contact-free peripheral regions in both conditions, consistent with the distribution of autophagosomes in dynein knockdown cells. This supports our interpretation that kinesindependent transport drives autophagosome redistribution in the absence of dynein, and that fat body cells exhibit subtle asymmetries in MT polarity that influence this transport. These new results have been included in the revised manuscript (see Figure 3G, H).

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.  

      Thank you for the suggestion to further validate our reporter. We depleted Atg1, a key kinase required for phagophore initiation, in the Vps16A RNAi background. This completely abolished the punctate mCherry-Atg8a distribution in knockdown cells (see Figure 1—figure supplement 1E, K), confirming that the labeled structures are indeed of autophagic origin.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry-Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.  

      Thank you for this constructive feedback. We agree that negative results must be interpreted conservatively due to potential differences in knockdown efficiency. We have revised our conclusions accordingly, clarifying that the factors identified are key for autophagosome motility, while acknowledging the possibility of false negatives.

      Reviewer #3 (Public review):  

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      Thank you for suggesting to improve our Epg5 localization data. We performed triple immunostaining for Atg8a, Lamp1-3xmCherry, and Epg5-9xHA in S2R+ cells. In addition to triple-positive structures—likely representing autolysosomes—we observed Atg8a and Epg59xHA double-positive vesicles that lacked Lamp1-3xmCherry signal, which likely correspond to pre-fusion autophagosomes. Based on these results, we propose that in addition to arriving via the endocytic route, Epg5 may also reach lysosomes through autophagosomes. These findings have been included in the revised manuscript (see Figure 5K).

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.  

      Thank you for this valuable suggestion. We initially considered Syntaxin17 for validation; however, we recently found that loss of Syx17 leads to a HOPS-dependent tethering lock between autophagosomes and lysosomes, which would confound interpretation, as autophagosomes remain tethered to lysosomes (DOI: 10.1126/sciadv.adu9605). Therefore, Syntaxin17 loss is not suitable for our purpose. Among the remaining fusion SNAREs, one RNAi line targeting Snap29 was available on a compatible chromosome for generating Drosophila lines equivalent to those used in the screen. We established this Snap29 RNAicontaining tester line and crossed it with our top hits. We observed that autophagosome motility was comparable to that in the Vps16A RNAi background, further supporting our conclusions. These results have been incorporated into the revised manuscript (see Figure 6)

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.  

      Thank you for pointing this out. We performed the suggested quantifications and statistical analyses for FYVE-GFP labeled endosomes, as well as for the number and size of lysosomes. The updated data are now presented in the revised Figure 5—figure supplement 1.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Thank you for raising this important point. We agree that our findings may not be fully generalizable to all cell types. Given that the organization of the microtubule network depends on both cell function and developmental stage, it is plausible that the molecular machinery described here operates differently elsewhere. We now mention this limitation in the Discussion.

      Minor concerns:  

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.  

      Thank you for pointing this out. We repeated the experiment shown in Figure 5C and replaced the panel with a clearer image. The smeared Dhc64C input bands in Figure 5L result from the unusually large size of this protein, which affects its electrophoretic migration. We mentioned this point in the corresponding figure legend.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.  

      Thank you for this comment. Both 3xmCherry-Atg8a and mCherry-Atg8a are well-established reporters that behave similarly as autophagic markers. Nevertheless, to avoid confusion, we ensured that each figure uses only one type of reporter consistently, which is now clearly indicated in the revised manuscript.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.  

      Thank you for your suggestion. We repeated these experiments and replaced the relevant panels with higher-quality images, including enlarged insets to better visualize small autophagosomes. These updated figures are now included in the revised manuscript.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?  

      Thank you for this insightful question. We tested this by co-transfecting S2R+ cells with Epg5-9xHA and different forms of Rab7: wild-type, GTP-locked (constitutively active), and GDP-locked (dominant-negative). Our results indicate that the strength of Epg5-Dhc interaction does not change in the presence of either GTP-locked or GDP-locked Rab7. However, we believe that Epg5 and dynein are recruited to the vesicle membranes via Rab7 in vivo, so we did not include these results in the revised manuscript.

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

      Thank you for this important comment. Accordingly, we have toned down our statements about Epg5 functions throughout the revised manuscript.

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 6: What do "autolysosome maturation" and "small autolysosomes" mean? Do different numbers of lysosomes fuse to a single autophagosome?

      Thank you for highlighting this point. We concluded that the formation of smaller autolysosomes—compared to controls—is likely due to a defect in autolysosome maturation, as is often the case. We had not explicitly considered whether a different number of lysosomes fuse with each autophagosome during this process. We clarified this issue in the revised manuscript.

      (2) Figure 5A shows that the localization of endogenous Atg8 requires Epg5, but the data is not as clear as for mCherry-Atg8 (Figure 4B). Why the difference?  

      Thank you for this question. The difference arises because the mCherry-Atg8a reporter strongly labels autolysosomes, as the mCherry fluorophore remains stable in acidic compartments. As a result, mCherry-Atg8a labels both autophagosomes and autolysosomes, but the strong autolysosomal signal originating from the surrounding GFP negative, nonRNAi cells can make accumulated autophagosomes appear fainter in fusion-defective cells (as in Figure 4). In contrast, endogenous Atg8a is degraded in lysosomes, and therefore labels only autophagosomes. This means that the appearance of these two experiments can be slightly different, but since in both cases autophagosomes no longer accumulate in the perinuclear region of Vps16A,Epg5 double RNAi cells we can conclude that Epg5 is required for autophagosome positioning. We explained this difference of the two methods in the revised manuscript where it first appears (Figure 1B and Figure 1—figure supplement 1A).

      (3) Blue letters on the black micrographs are hard to see. Some of the other letters are also small and hard to read.  

      Thank you for this suggestion. We improved the visibility and readability of the labels in the revised figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors employ a combined proteomic and genetic approach to identify the glycoprotein QC factor malectin as an important protein involved in promoting coronavirus infection. Using proteomic approaches, they show that the non-structural protein NSP2 and malectin interact in the absence of viral infection, but not in the presence of viral infection. However, both NSP2 and malectin engage the OST complex during viral infection, with malectin also showing reduced interactions with other glycoprotein QC proteins. Malectin KD reduce replication of coronaviruses, including SARS-COV2. Collectively, these results identify Malectin as a glycoprotein QC protein involved in regulating coronavirus replication that could potentially be targeted to mitigate coronavirus replication.

      Overall, the experiments described appear well performed and the interpretations generally reflect the results. Moreover, this work identifies Malectin as an important pro-viral protein whose activity could potentially be therapeutically targeted for the broad treatment of coronavirus infection. However, there are some weaknesses in the work that, if addressed, would improve the impact of the manuscript.

      Notably, the mechanism by which malectin regulates viral replication is not well described. It is clear from the work that malectin is a pro-viral protein in the work presented, but the mechanistic basis of this activity is not pursued. Some potential mechanisms are proposed in the discussion, but the manuscript would be strengthened if additional insight was included. For example, does the UPR activated to higher levels in infected cells depleted of malectin? Do glycosylation patterns of viral (or non-viral) proteins change in malectindepleted cells? Additional insight into this specific question would significantly improve the manuscript.

      We concur with the reviewer that the mechanism by which Malectin regulates viral replication is an important point to elucidate further. Our proteomics data were able to offer additional insight into the questions posed here. We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype.

      In addition, to address the second question, we compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested to be insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. These findings support a more direct role for MLEC in regulating viral replication.

      We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Further, the evidence for increased interactions between OST and malectin during viral infection is fairly weak, despite being a major talking point throughout the manuscript. The reduced interactions between malectin and other glycoproteostasis QC factors is evident, but the increased interactions with OST are not well supported. I'd recommend backing off on this point throughout the text, instead, continuing to highlight the reduced interactions.

      We agree that the fold change increase of OST interactions with malectin are small compared to the fold change decrease of other glycoproteostasis factors We have modified the text to less emphasize this point and instead highlight the reduced interactions:

      “Further, MHV infection retains the association of MLEC with the OST complex while titrating off other interactors, potentially leading to more efficient glycoprotein biogenesis.”

      I was also curious as to why non-structural proteins, nsp2 and nsp4, showed robust interactions with host proteins localized to both the ER and mitochondria? Do these proteins localize to different organelles or do these interactions reflect some other type of dysregulation? It would be useful to provide a bit of speculation on this point.

      We also find these ER and mitochondrial protein interactions curious, which we initially reported on (Davies, Almasy et al. 2020 ACS Infectious Diseases). In this prior report, we found that when expressed in HEK293T cells, SARS-CoV-2 nsp2 and nsp4 have partial localization to mitochondrial-associated ER membranes (MAMs), as determined by subcellular fractionation. Given that malectin has also been shown to have MAMs localization (Carreras-Sureda, et al. 2019 Nature Cell Biology), we have added additional text in the Discussion to speculate on this point:

      “Additionally, MLEC has also been shown to localize to ER-mitochondria contact sites (MAMs)(Carreras-Sureda et al., 2019), which regulate mitochondrial bioenergetics. We have previously shown that SARS-CoV-2 nsp2 and nsp4 can partially localize to MAMs(Davies et al., 2020), so these viral proteins may also dysregulate MLEC and MAMs activity to promote infection.”

      Again, the overall identification of malectin as a pro-viral protein involved in the replication of multiple different coronaviruses is interesting and important, but additional insights into the mechanism of this activity would strengthen the overall impact of this work.

      Thank you for this endorsement. We hope the additional analyses and discussion points in the revised manuscript further homed in on a direct mechanistic function for MLEC in modulating viral replication.

      Reviewer #2 (Public Review):

      Summary:

      A strong case is presented to establish that the endoplasmic reticulum carbohydrate binding protein malectin is an important factor for coronavirus propagation. Malectin was identified as a coronavirus nsp2 protein interactor using quantitative proteomics and its importance in the viral life cycle was supported by using a functional genetic screen and viral assays. Malectin binds diglucosylated proteins, an early glycoform thought to transiently exist on nascent chains shortly after translation and translocation; yet a role for malectin has previously been proposed in later quality control decisions and degradation targeting. These two observations have been difficult to reconcile temporally. In agreement with results from the Locher lab, the malectininteractome shown here includes a number of subunits of the oligosaccharyltransferase complex (OST). These results place malectin in close proximity to both the co-translational (STT3A or OST-A) and post-translational (STT3B or OST-B) complexes. It follows that malectin knockdown was associated with coronavirus Spike protein hypoglycosylation.

      Strengths:

      Strengths include using multiple viruses to identify interactors of nsp2 and quantitative proteomics along with multiple viral assays to monitor the viral life cycle.

      Weaknesses:

      Malectin knockdown was shown to be associated with Spike protein hypoglycosylation. This was further supported by malectin interactions with the OSTs. However, no specific role of malectin in glycosylation was discussed or proposed.

      We have emphasized our hypotheses on this point in the discussion and added a summary figure to highlight the specific role of malectin.

      Given the likelihood that malectin plays a role in the glycosylation of heavily glycosylated proteins like Spike, it is unfortunate that only 5 glycosites on Spike were identified using the MS deamidation assay when Spike has a large number of glycans (~22 sites). The mass spec data set would also include endogenous proteins. Were any heavily glycosylated endogenous proteins hypoglycosylated in the MS analysis in Fig 5D?

      Thank you for this suggestion. We compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. We added the following sections:

      “We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      The inclusion of the nsp4 interactome and its partial characterization is a distraction from the storyline that focuses on malectin and nsp2.

      We believe the nsp4 comparative interactome and functional genomics data offers a rich resource for further functional investigation by others, if made public. While we found the malectin and nsp2 storyline the most compelling to pursue, we believe the inclusion of the nsp4 data strengthens the overall approach, in agreement with Reviewer #3’s comments.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Davies and Plate set out to discover conserved host interactors of coronavirus non-structural proteins (Nsp). They used 293T cells to ectopically express flag-tagged Nsp2 and Nsp4 from five human and mouse coronaviruses, including SARS-CoV-1 and 2, and analyzed their interaction with host proteins by affinity purification mass-spectrometry (AP-MS). To confirm whether such interactors play a role in coronavirus infection, the authors measured the effects of individual knockdowns on replication of murine hepatitis virus (MHV) in mouse Delayed Brain Tumor cells. Using this approach, they identified a previously undescribed interactor of Nsp2, Malectin (Mlec), which is involved in glycoprotein processing and shows a potent pro-viral function in both MHV and SARS-CoV-2. Although the authors were unable to confirm this interaction in MHVinfected cells, they show that infection remodels many other Mlec interactions, recruiting it to the ER complex that catalyzes protein glycosylation (OST). Mlec knockdown reduced viral RNA and protein levels during MHV infection, although such effects were not limited to specific viral proteins. However, knockdown reduced the levels of five viral glycopeptides that map to Spike protein, suggesting it may be affected by Mlec.

      Strengths:

      This is an elegant study that uses a state-of-the-art quantitative proteomic approach to identify host proteins that play critical roles in viral infection. Instead of focusing on a single protein from a single virus, it compares the interactomes of two viral proteins from five related viruses, generating a high confidence dataset. The functional follow-ups using multiple live and reporter viruses, including MHV and CoV2 variants, convincingly depict a pro-viral role for Mlec, a protein not previously implicated in coronavirus biology.

      Weaknesses:

      Although a commonly used approach, AP-MS of ectopically expressed viral proteins may not accurately capture infection-related interactions. The authors observed Mlec-Nsp2 interactions in transfected 293T cells (1C) but were unable to reproduce those in mouse cells infected with MHV (3C). EIF4E2/GIGYF2, two bonafide interactors of CoV2 Nsp2 from previous studies, are listed as depleted compared to negative controls (S1D). Most other CoV2 Nsp2 interactors are also depleted by the same analysis (S1D). Previously reported MERS Nsp2 interactors, including ASCC1 and TCF25, are also not detected (S1D). Furthermore, although GIGYF2 was not identified as an interactor of MHV Nsp2/4 in human cells (S1D), its knockdown in mouse cells reduced MHV titers about 1000 fold (S4). The authors should attempt to explain these discrepancies.

      We acknowledge these limitations in AP-MS from ectopically expressed viral proteins and have addressed these discrepancies with further elaboration in the text:

      “A limitation of our study is the initial overexpression of individual proteins for AP-MS, in which we find some variation between our data with other AP-MS studies. We sought to overcome these variations by focusing on conserved interactors and testing interactions in a live infection context.”

      “We also found GIGYF2-KD strongly suppressed MHV infection, despite GIGYF2 not interacting with MHV nsp2 (Fig. S1D), highlighting the importance of proteostasis factors in infection regardless of direct PPIs.”

      More importantly, the authors were unable to establish a direct link between Mlec and the biogenesis of any viral or host proteins, by mass-spectrometry or otherwise. Although it is clear that Mlec promotes coronavirus infection, the mechanism remains unclear. Its knockdown does not affect the proteome composition of uninfected cells (S15B), suggesting it is not required for proteome maintenance under normal conditions. The only viral glycopeptides detected during MHV infection originated from Spike (5D), although other viral proteins are also known to be glycosylated. Cells depleted for Mlec produce ~4-fold less Spike protein (4E) but no more than 2-fold less glycosylated spike peptides (5D), compounding the interpretation of Mlec effects on viral protein biogenesis. Furthermore, Spike is not essential for the pro-viral role of Mlec, given that Mlec knockdown reduces replication of SARS-CoV-2 replicons that express all viral proteins except for Spike (6A/B).

      Thank you, these are all important points. We have acknowledged these compounding factors in the Discussion:

      “Concurrently, knockdown of MLEC leads to impediment of nsp production and aberrant glycosylation of other viral proteins like Spike, though it should be noted that the decrease in Spike glycopeptides is compounded by the overall decrease in Spike protein. Given that MLEC is pro-viral in a SARS-CoV-2 replicon model lacking Spike (Fig. 6), MLEC can promote CoV replication independent of Spike production.”

      Any of the observed effects on viral protein levels could be secondary to multiple other processes.Interventions that delay infection for any reason could lead to an imbalance of viral protein levels because Spike and other structural proteins are produced at a much higher rate than non-structural proteins due to the higher abundance of their cognate subgenomic RNAs. Similarly, the observation that Mlec depletion attenuates MHV-mediated changes to the host proteome (S15C/D) can also be attributed to indirect effects on viral replication, regardless of glycoprotein processing. In the discussion, the authors acknowledge that Mlec may indirectly affect infection through modulation of replication complex formation or ER stress, but do not offer any supporting evidence. Interestingly, plant homologs of Mlec are implicated in innate immunity, favoring a more global role for Mlec in mammalian coronavirus infections.

      We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype. Please all see similar points brought up by reviewer 1 (comment 1). We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). […] Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Finally, the observation that both Nsp2 (3C) and Mlec (3E/F) are recruited to the OST complex during MHV infection neither support nor refute any of these alternate hypotheses, given that Mlec is known to interact with OST in uninfected cells and that Nsp2 may interact with OST as part of the full length unprocessed Orf1a, as it co-translationally translocates into the ER. Therefore, the main claims about the role of Mlec in coronavirus protein biogenesis are only partially supported.

      We have acknowledged this point in the Discussion. 

      “We find that nsp2 interacts with several OST complex members, including DDOST, STT3A, and RPN1, though whether this is as part of the uncleaved Orf1a polyprotein during co-translational ER translocation or as an individual protein is unclear.”

      Reviewer #2 (Recommendations For The Authors):

      What is the proof that MLEC is a type I membrane protein? If it is strictly sequence analysis, this conclusion should be tapered in the text.

      Our response: We have added appropriate evidence on the biochemical characterization of MLEC topology from Galli et al., 2011, and cryo-EM structural characterization by Ramírez et al., 2019.

      “As it was surprising that nsp2, a non-glycosylated, cytoplasmic protein, would interact with MLEC, an integral ER membrane protein with a short two amino acid cytoplasmic tail(Galli et al., 2011; Ramírez et al., 2019), we assessed a broader genetic interaction between nsp2 and MLEC.”

      Validation of some of the nsp2 and malectin interactome components by pulldowns should be included.

      Our response: The interactions between nsp2 and Ddost, Stt3A, and Rpn1 passed a stringent confidence filter in our AP-MS experiment (Fig. 3C) based on several replication. For this reason, we do not believe additional validation by Western blotting will offer much useful information.

      NGI-1 inhibition of glycosylation looks to be very weak in Fig. 5B and Fig. S14B.

      Our response: It is important to note that the NGI-1 inhibition assay used a suboptimal NGI-1 concentration to prevent full suppression of MHV infection, which we have found previously. We have added this justification in the Methods section and associated figure legend (Fig. S14A).

      “The 5 uM NGI-1 dosage was chosen as it resulted in partial inhibition of glycosylation while not completely blocking MHV infection.”

      “This dosage and timing were chosen to partially inhibit the OST complex without fully ablating viral infection, as NGI-1 has been shown previously to be a potent positive-sense RNA virus inhibitor(Puschnik et al., 2017)  (Fig. S14)”

      Summary model figure at the end would help to communicate the conclusions.

      Our response: Thank you for this suggestion. We agree and have added a summary model figure at the end as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Given that there are different mutations identified at different CDK12 sites as illustrated in Figure 1B it would be nice to know which ones have been functionally classified as pathogenic and for which ones that the pathogenicity has not been determined. This would be especially interesting to perform in light of the differences in the LOH scores and WES data presented - specifically, are the pathogenic mutations vs the mutations for which true pathogenicity is unknown more likely to display LOH or TD?

      Alterations were classified as pathogenic when resulting in frameshift, nonsense, or cause an aminoacid change likely to alter function (according to ANNOVAR).  Four patients were called CDK12<sup>BAL</sup> but were negative for TDP signatures. Three of these had CDK12 mutations downstream of the kinase domain, which may be less likely to ablate protein activity. Most functionally validated pathogenic mutations include disruption of the kinase domain (PMID: 25712099). We added a sentence to the Results section (under “Identification of genomic characteristics that associate with CDK12 loss in prostate cancer”) to highlight this caveat on pathogenic mutation calls.

      For the cell inhibition studies with the CDK12/13 inhibitor, more details characterizing the specificity of this molecule to these targets would be useful. Additionally, could the authors perform short-term depletion studies with a PROTAC to the target or short shRNA or non-selected pool CRISPR deletion studies of CDK12 in these same cell lines to complement their pharmacological studies with genetic depletion studies? Also perhaps performing these same inhibitor studies in CDK12/13 deleted cells to test the specificity of the molecule would be useful.

      We are not aware of a CDK12-specific PROTAC, and generate such as reagent is beyond the scope of the present study. Regarding the specificity of the CDK12/13 inhibitor molecules, additional information on the specificity and in vivo dose selection were added to the Results section (under “CDK13 is synthetic lethal in cells with biallelic CDK12 loss”). Cells with CDK12-KO did not tolerate CDK13-KO, so we were unable to generate double knockouts to test for CDK12/13 inhibitor non-specific effects. 

      Additionally, expanding these studies to additional prostate cancer cell lines or organdies models would strengthen the conclusions being made. More information should be provided about the dose and schedule chosen and the rationale for choosing those doses and schedules for the in vivo studies proposed should be presented and discussed. Was there evidence for maximal evidence of inhibition of the target CDK12/13 at the dose tested given the very modest tumor growth inhibition noted in these studies.

      With respect to additional acute CDK12 loss models, our Tet-inducible shCDK12 models show only minor growth slowdown and do not appear to phenocopy the strong arrest or apoptosis seen with CDK12 KO or inhibition, respectively. Future work is ongoing to generate CDK12-degron regulated cell lines. We added a new immunoblot panel showing that acute CRISPR/sgRNA targeting of CDK12 does indeed lead to BRCA2 and ATM protein decrease (Fig. S4g), providing some orthogonal genomic targeting evidence of the acute HR gene effect.  We are continuing efforts to collect and generate additional CDK12<sup>BAL</sup> cell models, in both 2D and 3D culture systems, but none are presently available. We added a 3D culture drug dose curve with LuCaP189.4 exposed to THZ531 (Fig. S7m), which confirms heightened sensitivity vs two CDK12-intact lines. 

      Regarding assessments of CDK12 targets; as we are not aware of any unique CDK12 substrates, it is fair to ask but difficult to measure precise CDK12 inhibition by the compounds in tumors. We dosed mice using the same protocol as detailed in the original report testing SR4835 in mice (PMID: 31668947). We performed immunoblots on lysates from 3 and 28 day treated PDX tumors and did not see any consistent decreases in pRBP1(Ser2) or ATM or increases in γH2A.X (data not shown). However, we did see increases in APA usage and downregulation of DNA repair transcripts with three day treatment (Fig. 6k-l), as would be expected from on target acute effects.

      Reviewer #2 (Public review)

      One caveat that continues to be unclear as presented, is the uncoupling of cell cycle/essentiality of CDK12/13 from HR-directed mechanisms. Is this purely a cell cycle arrest phenotype acutely with associated down-regulation of many genes?

      In regard to untangling the effects of cell arrest on HR gene expression, this is a difficult question given that many HR genes, including BRCA2, are S/G2 linked. We attempted to account for those effects in the acute CDK12 inhibition experiment by including a palbociclib (CDK4/6i) control, which caused cell arrest and decreased BRCA1/2 RNA expression with no apparent 5/3’ transcript imbalance determined by qPCR (Fig. 4e,g). Though overall BRCA1 and BRCA2 mRNA levels are lower in the stable 22Rv1-CDK12-KO2 and KO5 lines, they do not show selective 3’ loss (Fig. 5c), suggesting the downregulation in these lines is mostly due to their slower growth (Fig. S4k) and not intronic polyA usage.

      While the RAD51 loading ssRNA experiments are informative, the Tet-inducible knockdown of BRCA2 and CDK12 is confusing as presented in Figure 5, shBRCA2 + and -dox are clearly shown. However, were the CDK12_K02 and K05 also knocked down using inducible shRNA or a stable knockout? The importance of this statement is the difference between acute and chronic deletion of CDK12. Previously, the authors showed that acute knockdown of CDK12 led to an HR phenotype, but here it is unclear whether CDK12K02/05 are acute knockdowns of CDK12 or have been chronically adapted after single cell cloning from CRISPR-knockout. 

      As a clarification, the 22Rv1-CDK12-KO2 and 22Rv1-CDK12-KO5 are stable CRISPR knockout clonal lines that were expanded from single cells. We added a new figure to include more validation of these lines (Fig. S5). We tried multiple times to reproduce the HRd phenotype and PARPi sensitivity with siRNA and inducible shRNA lines but were unable to see clear sensitivity differences, despite seeing the expected shifts with shBRCA2 controls (data not shown). It is possible the degree of knockdown (~80%), timing (8 days), or specific cell lines used in our experiments were not sufficient to expose the acute phenotype by this method.

      However, we were able to see acute HR gene decreases by inhibitor treatment (Fig. 4) or acute CRISPR (Fig. S4g).

      Given the multitude of lines, including some single-cell clones with growth inhibitory phenotypes and ex-vivo derived xenografts, the variability of effects with SR4835, ATM, ATR, and WEE1 inhibitors in different models can be confusing to follow. Overall, the authors suggest that the cell lines differ in therapeutic susceptibility as they may have alternate and diverse susceptibilities. It may be possible that the team could present this more succinctly and move extraneous data to the supplement.  

      We appreciate the complexity of the data and attempted to use multiple models to report consistency and variability. We are not able to ascertain what data would be extraneous, and elected to present data we view as relevant in the main figures while moving supporting data in the supplement.

      The in-vitro data suggests that SR4835 causes growth inhibition acutely in parental lines such as 22RV1. However, in vivo, tumor attenuation appears to be observed in both CDK12 intact and deficient xenografts, LuCAP136 and LuCaP 189.4 (albeit the latter is only nominally significant). Is there an effect of PARPi inhibition specifically in either model? What about the 22RV1-K02/05? Do these engraft? Given the role of CDK12/13 in RNAP II, these data might suggest that the window of susceptibility in CDK12 (mutant) tumors may not be that different from CDK12 intact tumors (or intact tissue) when using dual CDK12/13 inhibitors but rather represent more general canonical essential functions of CDK12 and CDK13 in transcription. From a therapeutic development strategy, the authors may want to comment in the discussion on the ability to target CDK13 specifically.

      Though the response of the CDK12<sup>BAL</sup> models to some compounds is variable, we believe those mixed results are important and future studies may be able to better explain why some show shifts in sensitivity while others do not. We hope future studies with additional models will help determine which sensitivities are more consistently true, and perhaps provide explanations for differences between models.

      Regarding SR4835, we find, and others have reported, a toxic (i.e. apoptotic) effect for in vitro treatment with dual CDK12/13 inhibitors (Fig. 4f, S4e,f); in fact, that may be why previous studies have used short timepoints in cell culture assays with these dual inhibitors. In mice, SR4835 was tolerated well but only LuCaP 189.4 showed statistically significant decreases in tumor volume and weight (Fig. 6j). We did not test PARPi responses in the PDX models, nor did we attempt engrafting the 22Rv1-CDK12-KO cell lines, but both would be worthwhile experiments in the future. Beyond CDK12<sup>BAL</sup> tumors, we agree that CDK12/13 inhibitors could be effective in cancer therapies more generally (e.g. triggering acute HRd, loss of RNAP2 phosphorylation). We added a line to the discussion section about ongoing efforts to combine PARPi and CDK12/13i, which we expect to be synergistic in CDK12-intact tumors due to the acute loss phenotype. We certainly agree that development of a specific CDK13 inhibitor would be the ideal therapeutic option for CDK12<sup>BAL</sup> tumors. However, CDK12 and CDK13 are 43% conserved at the protein level (PMID: 26748711), with 92% conservation in the active site (PMID: 30319007), and there are no available pharmacologic inhibitors that discriminate between CDK12 and CDK13.

      Reviewer #3 (Public review):

      It is generally assumed that CDK12 alterations are inactivating, but it is noteworthy that homozygous deletions are comparatively uncommon (Figure 1a). Instead many tumors show missense mutations on either one or both alleles, and many of these mutations are outside of the kinase domain (Figure 1b). It remains possible that the CDK12 alterations that occur in some tumors may retain residual CDK12 function, or may confer some other neomorphic function, and therefore may not be accurately modeled by CDK12 knockout or knockdown in vitro. This would also reconcile the observation that knockout of CDK12 is cell-essential while the human genetic data suggest that CDK12 functions as a tumor suppressor gene.

      Thank you for the feedback. It is a keen observation that homozygous deletions of CDK12 are not typical, though many mutations are upstream frameshifts that are expected to lead to loss of functional protein and mRNA via nonsense mediated decay. LuCaP189.4, our only natural mutant model, has two upstream frameshifts leading to complete protein loss (Fig 5b, S4h-i). We also added a caveat previously mentioned (in response to Reviewer 1) that mutations downstream of the kinase domain may be less likely to be fully pathogenic. For upstream missense mutations, the possibility of neuromorphic function remains an intriguing possibility that cannot be ruled out and would not be captured in our current models. Hopefully additional models can be developed, both natural and engineered, to help dissect that question in future studies.  

      It is not entirely clear whether CDK12 altered tumors may require a co-occurring mutation to prevent loss of fitness, either in vitro or in vivo (e.g. perhaps one or more of the alterations that occur as a result of the TDP may mitigate against the essentiality of CDK12 loss).

      We concur. Another caveat with the CRISPR models, beyond reliance on upstream frameshift mutations, is the simultaneous loss of alleles. In human tumors, there may be a period of single copy loss before the second hit that may provide a window for adaptation. It is possible that sequential loss is far easier for a cell to tolerate than acute bi-allelic inactivation. We agree that the question of what (if any) cooperating genetic alterations are required to tolerate CDK12 loss is an important one that we plant to further explore in future work.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      The authors have thoroughly addressed all issues of data availability, reagents, in vivo protocols, and animal approvals associated with the studies presented in this manuscript. Specific comments and experimental suggestions that in my opinion would strengthen the conclusions of this interesting and compelling manuscript are included above

      Reviewer #2 (Recommendations for the authors):

      The authors were thorough in their studies. As a general note, switching between the cell lines is often overwhelming in interpreting the data given cell-to-cell variability in response. If possible, consolidating the text/conclusions in results would improve the readability of the manuscript.

      The variety of cell lines and models is perhaps expansive at times, but we hope the inclusion of these different models helps support the conclusions. 

      Is it possible to knockout CDK12 acutely using a degron-based approach, instead of utilizing an inhibitor that targets both CDK12/13?

      There is a HeLa cell line made with analog-sensitive CDK12 (Bartkowiak, Yan, and Greenleaf 2016) but we were unaware of any such prostate lines at the time of this work. We are attempting to develop engineered prostate lines with specific CDK12 degradation but do not yet have them available.

      How do the authors address a lower BRCA1/2 level in for example 22RV1-K05, does this cell line have increased sensitivity to PARPi over its parental 22RV1 line? Could this be added to Figure 5h/i?

      The lower BRCA2 levels in 22Rv1CDK12-KO5 is likely due to the slower growth rate (Fig. S4k), as BRCA2 expression is S/G2 linked. While the mRNA level of BRCA2 overall is lower in the KO5 line, we do not observe the 5’/3’ transcript imbalance (Fig. 5c). The 22Rv1-CDK12-KO lines did not show increased sensitivity to carboplatin, while inducible shBRCA2 did (Fig. S7a), so we do not believe this lower BRCA2 confers functional HRd. We did test the KO lines with olaparib (Fig. S7d) and saw a modest increased sensitivity compared to parental 22Rv1, but not to the extent measured in the BRCA1 mutant line UWB1.289.

      What is the clonality of the LuCAP 189.4 lines upon derivation? Is biallelic CDK12 loss seen in all cells?

      We do not know the exact clonality of the LuCAP 189.4 PDX or CL model, but we do see highly uniform CDK12 protein loss in these cells (quantified by IHC staining, data not shown).

      The authors state that 22RV1-K02/05 has an increased growth arrest to CDK13 inhibition. However, in Figure 6h, it appears the viability is not significantly different compared to the parental 22RV1 line. Similar aspects noted in 189.4-vec/CDK12?

      We found that 22Rv1 KO2/KO5 have growth arrest with sgCDK13 and cell death with CDK12/13 inhibitor. We did notice that SR4835 did not show the differential effects we anticipated (Fig. 6h), as was seen with THZ531 (Fig. 6i). SR4835 is a non-covalent inhibitor, while THZ531 is a covalent binder, so there are some functional differences between these compounds that might explain the lack of differential effects in the isogenic lines in a 4 day in vitro assay. We included the SR4835 in vitro data because it was used for the in vivo experiment. THZ531 is not suited for animal use.

      Could the authors comment on SR4835 response in vivo as a function of tumor growth rate?

      The in vivo SR4835 treated LuCaP189.4 did show signs of reduced proliferation with decreased Cell Cycle and DNA Replication in the RNA-seq signatures, but a more detailed investigation into cell cycle arrest vs apoptotic response has yet to be fully explored. We plan to conduct additional PDX experiments if we can obtain a selective CDK13 inhibitor. 

      Do the authors explore TDPs in their isogenic cell lines?

      We performed low coverage WGS on the 22Rv1 KO clones and added that to the paper (Fig. S5c). We did not see any obvious signs of TDP. We suspect the phenotype takes longer to accumulate and is not apparent within the ~20 passages our clones underwent in culture. This would be consistent with the tumor analysis (Fig. 2b) showing increase in TDs from primary to metastatic tumors, suggesting TDs accumulate over time.

      A future study may allow for screening synthetic lethals in the context of CDK12 loss in the presence or absence of SR4835 inhibition.

      We are actively pursuing experiments to identify new synthetic lethal targets by CRISPR and drug screens in CDK12 loss models and hope to report those in a future study.

      Reviewer #3 (Recommendations for the authors):

      As discussed above, the authors may wish to adjust their terminology to "CDK12-altered" rather than "CDK12 lost" or "CDK12-inactivated" to leave open the possibility that some tumors may retain residual CDK12 function or adopt neomorphic functions.

      Thank you for the additional comments and feedback. The possibility of neomorphic CDK12 allele function is important. As our models were all complete protein loss mutations, we decided to retain “biallelic loss” as our preferred nomenclature, but the note is well taken.

      The plots in Figures 1f-h are interesting and suggest that certain cancer genes (especially oncogenes) are recurrently gained in CDK12-altered tumors. It may be interesting to look at this on the individual level rather than the cohort level to see whether any groups of oncogenes tend to be gained together in an individual patient - this could inform whether certain combinations of cancer drivers cooperate with CDK12 alteration to drive oncogenesis.

      Thank you for the idea of looking at the patient-level for TDP-enriched oncogenes. A preliminary assessment did not identify recurrent co-gained oncogenes. We will continue these analyses as additional patient tumors with CDK12 alterations are identified. 

      The finding that AR gene or enhancer are recurrently gained with TDP is interesting and I am curious whether the authors have thoughts on whether these alterations can also be seen in the 1-2% of CDK12altered primary prostate cancers that are treatment naïve, and where AR pathway alterations are not as frequently seen.

      We did not focus on CDK12 altered primary prostate cancers, but we did check if there is AR amplification enrichment in the 6 CDK12<sup>BAL</sup> cases of the TCGA-PRAD dataset and did not identify enrichment. However, with such small numbers we would hesitate to draw any hard conclusions. 

      It could be interesting to more comprehensively characterize some of the CDK12 KO-adapted lines in Figure 5 (e.g. by WES or WGS) to determine whether they exhibit the TDP and/or whether they have acquired any secondary mutations that allow them to adapt to CDK12 loss.

      We are planning to do further genomics characterization of the CDK12-KO lines and will hopefully include that in a future study. Genomic analyses of the 22Rv1 clones (see copy number plots in Fig. S5c) did not identify a TDP. We plan to repeat the genomic assessments over additional cell passages and we have planned additional experiments designed to understand why some cells tolerate CDK12 loss and others do not.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Hurtado et al. show that Sox9 is essential for retinal integrity, and its null mutation causes the loss of the outer nuclear layer (ONL). The authors then show that this absence of the ONL is due to apoptosis of photoreceptors and a reduction in the numbers of other retinal cell types such as ganglion cells, amacrine cells, and horizontal cells. They also describe that Müller Glia undergoes reactive gliosis by upregulating the Glial Fibrillary Acidic Protein. The authors then show that Sox9+ progenitors proliferate and differentiate to generate the corneal cells through Sox9 lineage-tracing experiments. They validate Sox9 expression and characterize its dynamics in limbal stem cells using an existing single-cell RNA sequencing dataset. Finally, the authors argue that Sox9 deletion causes progenitor cells to lose their clonogenic capacity by comparing the sizes of control and Sox9-null clones. Overall, Hurtado et al. underline the importance of Sox9 function in retinal and corneal cells.

      Strengths:

      The authors have characterized a myriad of striking phenotypes due to Sox9 deletion in the retina and limbal stem cells which will serve as a basis for future studies.

      Weaknesses:

      Hurtado et al. investigate the importance of Sox9 in the retina and limbal stem cells. However, the overall experimental narrative appears dispersed.

      (1) The authors begin by characterizing the phenotype of Sox9 deletion in the retina and show that the absence of the ON layer is due to photoreceptor apoptosis and a reduction in other retinal cell types. The authors also note that Müller glia undergoes gliosis in the Sox9 deletion condition. These striking observations are never investigated further, and instead, the authors switch to lineage-tracing experiments in the limbus that seem disconnected from the first three figures of the paper. Another example of this disconnect is the comparison of Sox9 high and Sox9 low populations using an existing scRNA-seq dataset and the subsequent GO term analysis, which does not directly tie in with the lineage-tracing data of the succeeding Sox9∆/∆ experiments.

      We thank the reviewer for their thoughtful observations. We would like to clarify the rationale behind the structure of our study and how the different parts are conceptually connected.

      Our central aim was to investigate the role of Sox9 in the adult eye. Given that Sox9 has been extensively studied during embryonic development, we specifically chose to use an inducible conditional knockout strategy (CAG-CreERTM) in order to assess its function postnatally, in the adult eye. This approach revealed a severe retinal phenotype, whereas the cornea showed no overt phenotype. A major strength of our experimental design is that it allowed us to examine the role of Sox9 specifically in the adult eye, avoiding confounding effects from embryonic development. Nevertheless, this approach entails an inherent limitation: the mosaic nature of the CAG-CreERTM system leads to substantial variability in both the extent and distribution of Sox9 inactivation among individual animals. We invested considerable effort over extended periods to obtain reliable and biologically meaningful data despite this variability. We did not proceed further because this mosaicism poses a significant limitation when attempting to dissect downstream mechanisms in a consistent and reproducible manner, making it extremely challenging to perform in-depth mechanistic studies.

      Regarding the cornea, given the absence of a clear phenotype upon Sox9 deletion, we expanded our investigation by adding lineage-tracing and transcriptomic analyses to better understand Sox9’s potential role in adult limbal epithelial stem cells. These additional experiments provided valuable insight into Sox9 function in the adult cornea, even in the absence of gross morphological changes. Thus, while the retinal and corneal data stem from different experimental approaches, they are unified by a shared goal: understanding the celltype-specific and tissue-specific functions of Sox9 in the adult eye.

      To ensure that other readers do not perceive this apparent disconnect, and overstate our conclusions, we have modified the manuscript.  In the Introduction section, we have included the main findings from studies conducted to date on the role of Sox9 in the cornea and retina, and we have removed the corresponding section from the Discussion. We believe it is now clear that our study focuses on the role of Sox9 in the adult eye, in contrast to previous studies, which focused on the developing eye.

      In the Discussion section, we have added a new paragraph at the beginning and end that explicitly addresses the relationship between the retinal and limbal findings, illustrating how a single transcription factor can play distinct roles in different tissues within the same organ.

      Regarding the reviewer’s comment that the scRNA-seq analyses appear disconnected from the lineage-tracing data, we respectfully disagree. These analyses provide independent transcriptional confirmation that Sox9 is a marker of limbal stem cells, reinforcing the conclusions drawn from our in vivo experiments. These approaches are complementary and they converge on the same biological insight: Sox9 marks a population with stem-like properties in the adult limbus. Nevertheless, we acknowledge the reviewer’s concern and have moderated the tone of our statements in the revised version of the manuscript to better reflect the supporting nature of the scRNA-seq data, without overstating its functional implications.

      (2) A major concern is that a single Sox9∆/∆ limbal clone has a sufficiently large size, comparable to wild-type clones, as seen in Figure 6D. This singular result is contrary to their conclusion, which states that Sox9-deficient stem cells minimally contribute to the maintenance of the cornea.

      We thank the reviewer for this important observation.

      Ligand-independent activity of Cre-ER fusion proteins has been repeatedly reported in various mouse models (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009). This basal recombinase activity is thought to arise from inappropriate nuclear translocation or proteolysis of the Cre-ER fusion protein, leading to low-level recombination even in the absence of tamoxifen. Consistent with this, prior studies using the same CAGG-CreERTM; R26R-LacZ system for clonal analysis in the cornea have observed sparse reporter expression before tamoxifen administration (Dorà et al., 2015).

      In line with these findings, we also detected minimal background LacZ staining in Sox9Δ/ΔLacZ corneas (mean surface area: 0.85%; n = 8 eyes). This low-level staining likely reflects recombination events in transient amplifying or more differentiated cells, which are not expected to generate long-lived clones. However, in the rare instance of a large clone, as shown in Figure 6D, we believe that a spontaneous recombination event may have occurred in a bona fide limbal stem cell, giving rise to a sustained contribution. To rigorously address this potential artefact and assess the true contribution of Sox9-deficient stem cells, we conducted a comparative analysis of 8 control (Sox9Δ/+-LacZ) and 5 mutant (Sox9Δ/ΔLacZ) corneas. This analysis revealed a highly significant 8-fold reduction in the LacZpositive surface area in mutant samples (Sox9Δ/+-LacZ: 6.65 ± 1.77%; Sox9Δ/Δ-LacZ: 0.85 ± 0.85%; paired t-test, p = 0.00017; Figs. 6E and F; Table S12).

      We chose to include the image of the large clone in the main figure precisely because it does not align with our working hypothesis. We believe that showing such exceptions transparently is scientifically important and may be valuable for other researchers using similar inducible systems. Nonetheless, based on previous literature, the number of samples analyzed, and the statistically significant reduction in clonal contribution, we maintain that the observed phenotype reflects a true biological effect of Sox9 loss, supporting our conclusion that Sox9-deficient stem cells contribute minimally to corneal maintenance. To make that point clearer, we have introduced the following sentence in lines 462-464 of the revised version of the manuscript.

      “A possible explanation for this clone may be that spontaneous ligand-independent activity of Cre-ER fusion may have occurred in a bona fide limbal stem cell, as previously reported (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009, Dorà et al., 2015).”

      Reviewer #2(Public revciew):

      Sox9 is a transcription factor crucial for development and tissue homeostasis, and its expression continues in various adult eye cell types, including retinal pigmented epithelium cells, Müller glial cells, and limbal and corneal basal epithelia. To investigate its functional roles in the adult eye, this study employed inducible mouse mutagenesis. Adult-specific Sox9 depletion led to severe retinal degeneration, including the loss of Müller glial cells and photoreceptors. Further, lineage tracing revealed that Sox9 is expressed in a basal limbal stem cell population that supports stem cell maintenance and homeostasis. Mosaic analysis confirmed that Sox9 is essential for the differentiation of limbal stem cells. Overall, the study highlights that Sox9 is critical for both retinal integrity and the differentiation of limbal stem cells in the adult mouse eye.

      Strengths:

      In general, inducible genetic approaches in the adult mouse nervous system are rare and difficult to carry out. Here, the authors employ tamoxifen-inducible mouse mutagenesis to uncover the functional roles of Sox9 in the adult mouse eye.

      Careful analysis suggests that two degeneration phenotypes (mild and severe) are detected in the adult mouse eye upon tamoxifen-dependent Sox9 depletion. Phenotype severity nicely correlates with the efficiency of Cre-mediated Sox9 depletion.

      Molecular marker analysis provides strong evidence of Mueller cell loss and photoreceptor degeneration.

      A clever genetic tracing strategy uncovers a critical role for Sox9 in limbal stem cell differentiation.

      Weaknesses:

      (1) The Introduction can be improved by explaining clearly what was previously known about Sox9 in the eye. A lot of this info is mentioned in a single, 3-page long paragraph in the Discussion. However, the current study's significance and novelty would become clearer if the authors articulated in more detail in the Introduction what was already known about Sox9 in retina cell types (in vitro and in vivo).

      We appreciate this insightful comment. Following the reviewer`s suggestion, we have reorganized the manuscript to provide a clearer scientific context in the Introduction. Specifically, we have moved the relevant background information on Sox9 in different retinal cell types—previously included in a single, extended paragraph in the Discussion—into the Introduction. This helps to better frame our study within the context of existing knowledge.

      Additionally, we have emphasized more explicitly that our work does not focus on embryonic development, as most previous studies on Sox9 have done, but instead investigates its role in the adult mouse retina and limbus/cornea. We believe this represents an important and novel aspect of our study, as the mechanisms of retinal maintenance and limbal stem cell differentiation in the adult have been less extensively studied.

      (2) Because a ubiquitous tamoxifen-inducible CreER line is employed, non-cell autonomous mechanisms possibly contribute to the observed retina degeneration. There is precedence for this in the literature. For example, RPE-specific ablation of Otx2 results in photoreceptor degeneration (PMID: 23761884). Have the authors considered the possibility of non-cell autonomous effects upon ubiquitous Sox9 deletion?

      Given the similar phenotypes between animals lacking Otx2 and Sox9 in specific cell types of the eye, the authors are encouraged to evaluate Otx2 expression in the tamoxifen-induced Sox9 adult retina.

      We appreciate the insightful comment of the reviewer regarding the potential contribution of non-cell autonomous mechanisms to the retinal degeneration observed upon ubiquitous Sox9 deletion. We agree that this is an important consideration, particularly in the context of findings showing that RPE-specific deletion of Otx2 results in secondary photoreceptor degeneration.

      However, we would like to emphasize that RPE-specific deletion of Sox9 does not lead to photoreceptor loss or retinal degeneration, as previously shown (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018) [PMID: 24634209; PMID: 29609731; PMID: 29986868]. In addition, it was shown that Sox9 deletion in the RPE caused downregulation of visual cycle genes but did not compromise photoreceptor integrity or survival. Interestingly, Otx2 expression was found to be upregulated in the absence of Sox9, further supporting the view that Sox9 is not a simple upstream regulator of Otx2 in the adult RPE (Matsuda, 2014). These findings suggest that RPE dysfunction alone cannot account for the severe retinal phenotype we observe in our model.

      In our study, we observed that photoreceptor degeneration correlates strongly with the depletion of Sox9 Müller glial cells. Given the well-established supportive and neuroprotective roles of Müller glia, we interpret the retinal degeneration in our model to be primarily a consequence of Müller cell dysfunction (confirmed by the loss of Müller glia markers, such as SOX8 and S100). This interpretation is further supported by previous studies showing that selective ablation of Müller glia can lead to photoreceptor degeneration through cell-autonomous mechanisms (Shen et al., 2012) [PMID: 23136411].

      Nevertheless, we agree that this possibility deserves further investigation, and we have acknowledged it in the following paragraph that has been added to the Discussion section (lines 511-523 of the revised ms):

      “An important consideration in our model is the potential contribution of non-cell autonomous mechanisms to photoreceptor degeneration. Sox9 is expressed in both MG and RPE cells, and both cell types are known to support photoreceptor viability (Poché et al., 2008; Masuda et al., 2014). Notably, Sox9 and Otx2 cooperate to regulate visual cycle gene expression in the RPE (Masuda et al., 2014), and loss of Otx2 specifically in the adult RPE leads to secondary photoreceptor degeneration through non-cell autonomous mechanisms (Housset et al., 2013). However, RPE-specific deletion of Sox9 does not induce retinal degeneration and in fact results in Otx2 upregulation (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018), suggesting that Sox9 is not an upstream regulator of Otx2 in this context. Further investigation into the molecular and cellular interactions between MG, RPE, and photoreceptors may help to clarify the indirect pathways contributing to degeneration in the absence of Sox9.”

      Consistent with the above, a new citation has been included:

      Housset M, Samuel A, Ettaiche M, Bemelmans A, Béby F, Billon N, Lamonerie T. 2013. Loss of Otx2 in the adult retina disrupts retinal pigment epithelium function, causing photoreceptor degeneration. J Neurosci 33:9890–904. doi:10.1523/JNEUROSCI.1099-13.2013.

      (3) The most parsimonious explanation for the dual role of Sox9 in retinal cell types and limbal stem cells is that the cell context is different. For example, Sox9 may cooperate with TF1 in photoreceptors, TF2, in Mueller cells, and TF3 in limbal stem cells, and such cell typespecific cooperation may result in different outcomes (retinal integrity, stem cell differentiation). The authors are encouraged to add a paragraph to the discussion and share their thoughts on the dual role of Sox9.

      We thank the reviewer for this thoughtful and constructive suggestion. In , we have added a paragraph at the end of the Discussion addressing the potential dual role of Sox9 in the cornea and retina. In this new section, we discuss how Sox9 might exert distinct functions depending on the cellular context, possibly through interactions with different transcriptional partners in specific cell types. This may help explain the contrasting roles of Sox9 in maintaining retinal integrity versus regulating stem cell differentiation in the limbal epithelium.

      (4) One more molecular marker for Mueller glial cells would strengthen the conclusion that these cells are lost upon Sox9 deletion.

      We thank the reviewer for this constructive suggestion. To reinforce our conclusion that most Müller glial cells are lost following Sox9 deletion, we analysed the expression of S100, a well-established cytoplasmic marker of Müller glia. As S100 is primarily localized to the innermost Müller cell processes and not restricted to cell bodies, direct cell counting was not feasible. Instead, we quantified the S100+ signal intensity across defined retinal surface areas. This analysis revealed a statistically significant reduction in S100 signal in Sox9<sup>Δ/Δ</sup> retinas compared to controls. These new data, included in the revised Figure 1 (panels F and G), support and extend our previous observations using SOX8, further confirming the loss of Müller glial cells in Sox9-deficient retinas.

      We have also modified the manuscript based on this new evidences as follows:

      In the Results section, lines 168-177 of the revised ms, we have added the following paragraph: “To independently validate the loss of MG cells in Sox9-deficient retinas, we examined the expression of S100, a cytoplasmic marker that labels the processes of adult Müller cells. In control retinas, strong S100 immunoreactivity was observed across the inner retina, outlining the typical radial projections of Müller glia (Fig. 1F). In contrast, Sox9Δ/Δ retinas with an extreme phenotype exhibited a marked reduction in S100 signal (Fig. 1G). Given the diffuse cytoplasmic localization of S100, we quantified its expression by measuring the fluorescence signal within a defined surface area of the retina. This analysis revealed a statistically significant reduction in S100 signal intensity in mutant samples (including both mild and extreme phenotypes) compared to controls (Fig. 1G; Table S4), further supporting the loss of MG cells upon Sox9 deletion.”

      In Methods, line 684 of the revised ms, the anti-S100 antibody reference and its working dilution have been added.

      (5) Using opsins as markers, the authors conclude that the photoreceptors are lost upon Sox9 deletion. However, an alternate possibility is that the photoreceptors are still present and that Sox9 is required for the transcription of opsin genes. In that case, Sox9 (like Otx2) may act as a terminal selector in photoreceptor cells. This point is particularly important because vertebrate terminal selectors (e.g., Nurr1, Otx2, Brn3a) initially affect neuron type identity and eventually lead to cell loss.

      We perfectly understand the reviewer’s point. However, we believe that the possibility that Sox9 regulates opsin gene expression without affecting photoreceptor survival is very unlikely in our model. The primary evidence comes from the histological analysis shown in Figure 1B, where hematoxylin and eosin staining clearly demonstrates the complete loss of the ONL in Sox9<sup>Δ/Δ</sup> retinas exhibiting the extreme phenotype. Similarly, DAPI counterstain also evidences the lack of the ONL in many of our immunofluorescence images of these samples.  This morphological disappearance of the ONL strongly supports the conclusion that photoreceptor cells are not merely transcriptionally silent but are physically absent.

      Furthermore, TUNEL assays in two retinas with a mild phenotype revealed extensive apoptosis within the ONL, suggesting a progressive degeneration process rather than a selective transcriptional effect. While we acknowledge that transcriptional regulation of opsin genes by Sox9 cannot be entirely ruled out, the observed phenotype is more consistent with a structural loss of photoreceptors rather than a change in their molecular identity alone. Therefore, our data support the interpretation that Sox9 is required for photoreceptor survival, likely through non-cell autonomous mechanisms related to Müller glia dysfunction, rather than acting as a terminal selector within photoreceptor cells themselves.

      (6) Quantification is needed for the TUNEL and GFAP analysis in Figure 3.

      We have quantified the GFAP immunofluorescence signal across defined surface areas of the retina and found a statistically significant increase in GFAP expression in Sox9<sup>Δ/Δ</sup> mutants compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These quantification data are now included in the revised Figure 3.

      Regarding the TUNEL assay, although extensive apoptosis was clearly observed in two Sox9<<sup>Δ/Δ</sup> retinas with a mild phenotype (as shown in Figure 3A), this pattern was not consistent across the full study mouse cohort. Out of 15 mutant samples analyzed (5 of them previously analyzed and 10 additional ones that have been newly analyzed), only two exhibited this pronounced apoptotic pattern. However, in the remaining 13 mutants, we did observe a small but statistically significant increase in the number of TUNEL+ cells compared to controls (zero-inflated Poisson test, P = 0.028, n = 5 controls, 13 mutants). These results are now included in Figure 3 and in Tables S7 and S8.

      This pattern likely reflects the transient nature of apoptosis in the degenerative process, which may occur rapidly and thus be difficult to capture consistently at a single time point. Nevertheless, the quantification supports our conclusion that Sox9 loss is associated with increased photoreceptor cell death.

      Based on the above, we have included the following paragraphs in the Results section of the manuscript:

      In lines 224-252 of the revised ms, the final version of the paragraph is as follows: “Since photoreceptors are absent in severely affected Sox9-mutant retinas, we conducted TUNEL assays to study the role of cell death in the process of retinal degeneration. In control samples (n=5), almost no TUNEL signal was observed in the retina. In contrast, Sox9<sup>Δ/Δ</sup> mice (n=15) showed numerous TUNEL+ cells, mainly located in the persisting ONL, indicating that photoreceptor cells were dying (Fig. 3A). Although extensive TUNEL staining in the ONL was clearly observed in two Sox9<sup>Δ/Δ</sup> retinas with mild phenotypes, this pattern was not consistently present across the full cohort. In the remaining 13 mutant retinas, we observed a modest but noticeable increase in the number of apoptotic cells compared to controls (Fig. 3B; Table S7). Despite a high frequency of zero counts (particularly among controls), the difference between groups reached statistical significance when analyzed using a zeroinflated Poisson model (P = 0.028; n = 5 controls, 13 mutants). These findings suggest that photoreceptor apoptosis following Sox9 deletion may occur acutely and within a narrow temporal window, making it challenging to capture the full degenerative process at a single time point”.

      Lines 263-269 of the revised ms: “To support these observations quantitatively, we measured GFAP fluorescence intensity across defined retinal surface areas in control and Sox9<sup>Δ/Δ</sup> mice (Fig. 3D; Table S8). This analysis revealed a statistically significant increase in GFAP signal in mutant retinas compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These results are consistent with a progressive gliotic  following Sox9 deletion and provide further evidence that MG cells become reactive in the absence of Sox9”.

      Similarly, the section “Estimation of the percentage of tamoxifen-induced, Cre-mediated recombination” has been expanded as follows:

      Lines 660-665 of the revised ms: “In parallel, to quantify GFAP expression as a measure of MG reactivity, we analyzed GFAP immunofluorescence intensity across defined retinal surface areas. Given the cytoplasmic distribution of GFAP within glial processes, direct cell counting was not feasible. Instead, fluorescence intensity was measured using ImageJ, within full-thickness retinal regions in 20x microphotographs of a retinal sections stained for GAFP. The total GFAP signal was normalized to the measured area for each section”.

      (7) Line 269-320: The authors examined available scRNA-Seq data on adult retina. This data provides evidence for Sox9 expression in distinct cell types. However, the dataset does not inform about the functional role of Sox9 because Sox9 mutant cells were not analyzed with RNA-Seq. Hence, all the data that claim that this experiment provides insights into possible Sox9 functional roles must be removed. This includes panels F, G, and H in Figure 5. In general, this section of the paper (Lines 269-320) needs a major revision. Similarly, lines 442-454 in the Discussion should be removed.

      We thank the reviewer for this important observation. We agree that the scRNA-Seq dataset used in this section does not include Sox9 mutant cells and therefore does not allow us to assess the consequences of Sox9 loss-of-function. However, we believe that this analysis still provides valuable complementary information. Specifically, it confirms that Sox9 is expressed in a distinct population of limbal stem cells, and that its expression dynamically changes along differentiation trajectories. Although we do not infer causality or phenotypic consequences, the ability to observe how gene expression programs shift as Sox9 is downregulated offers insights into potential transcriptional programs associated with Sox9 activity.

      We have carefully revised Lines 269–320 to remove any overinterpretations, and eliminated the corresponding lines in the Discussion (Lines 442–454). However, we have retained Panels G, and H in Figure 5 with updated text that reflect the descriptive nature of these findings, specifically to illustrate that the Sox9-positive cell signature is consistent with a stem cell genetic program, and that when Sox9 is downregulated some gene pathways involved in stem cell differentiation are upregulated.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) Figure 1C shows the proportions of Sox9+cells that express Sox8 in control, mild and extreme phenotypes. However, as no quantitative classification of mild and extreme phenotypes is reported along with Figure 1A, the large standard deviation for Sox9∆/∆ mild retina might be due to a misclassification of the sample. Therefore, the authors must ascribe each sample to "mild" or "extreme" based on a quantitative metric.

      We appreciate the reviewer’s suggestion to clarify the classification criteria used to distinguish “mild” and “extreme” phenotypes in Sox9<sup>Δ/Δ</sup> retinas. As noted, our classification was based on a qualitative, phenotypic assessment of retinal morphology in hematoxylin/eosin-stained sections. Specifically, retinas were classified as “extreme” when the outer nuclear layer (ONL) was completely absent, and as “mild” when the ONL was present, although often reduced in thickness. This classification reflects the observable structural depletion of the ONL and aligns well with the extent of Sox9 loss in Müller glial cells, as shown in Figure 1. We acknowledge that some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced deletion.

      The phenotypic classification of each individual sample is explicitly provided in Supplementary Table S1. We have also added a statement in the Results section clarifying that this classification was based on qualitative histological criteria rather than a numerical threshold.

      Lines 104-113 of the revised ms: “We categorized Sox9<sup>Δ/Δ</sup> retinas into “mild” and “extreme” phenotypes in order to facilitate interpretation of our data. Clasification was based on a qualitative assessment of ONL integrity in histological sections. Specifically, samples were classified as “extreme” when the ONL was completely depleted, and as “mild” when the ONL persisted, albeit variably reduced in thickness. This phenotypic classification reflects observable structural differences rather than a fixed quantitative threshold. Some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced Cre-mediated Sox9 deletion”

      (2) The authors infer Sox9 high and Sox9 low groups of limbal stem cells using an existing scRNA-seq dataset. However, an immunohistology-based validation of this difference is missing. Given that limbal stem cells express Sox9, the authors must examine the heterogeneity in Sox9 levels within the Sox8+ population to demonstrate their claim: "...Sox9 expression decreases as transiently amplifying progenitors undergo progressive differentiation from limbal to peripheral corneal cells." in Line 292. Ideally, this must be further validated using differentiation markers corresponding to CB and ILB populations that show lower Sox9 expression according to the pseudotime graph.

      To validate the Sox9 expression results obtained with scRNA-seq, we performed double immunofluorescence for Sox9 and P63, the latter expressed by the basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001, https://www.pnas.org/doi/abs/10.1073/ pnas.061032098). These results can be observed in the new panel 5F. Accordingly we have included a new paragraph in lines 369-396 of the revised version of the ms:

      “To validate these results, we decided to closely examine Sox9 expression in the limbus using immunofluorescence. Previous analyses revealed that the outer limbus is approximately 100 μm wide, while the inner limbus is wider, around 240 μm (Altshuler 2021). We observed that in the region corresponding to the OLB, most cells showed strong Sox9 expression. In the area corresponding to the ILB, this immunoreactivity appeared weaker in the basal layer (corresponding to the ILB proper), and no expression was detected in the suprabasal layers (flattened cells; Fig 5F left). Double immunofluorescence for SOX9 and P63, which is expressed in basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001) revealed that Sox9 expression was restricted to P63-positive cells (Fig 5F right). These observations confirm that Sox9 is expressed in a basal cell population within both the OLB and ILB, and that its expression decreases in differentiated transient amplifying cells. ”

      We also have deleted  “This expression pattern is consistent with our immunofluorescence observations" from line 356 of the revised ms.

      (3) The authors' claim of "...Sox9-null cells cannot survive or proliferate as well as their wildtype neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea" does not seem very convincing in the light of Fig.6D and S3B where Sox9 deletion can still allow for a large LacZ+ clone. Their claim of wild-type cornea due to out-competing neighbors must be validated by increasing the number of Sox9-null progenitors, which can be tested by administering tamoxifen for a significantly longer duration, leading to a majority Sox9 deficient progenitor population, and then examining limbal and corneal defects.

      As previously discussed, we observed only one instance of a large LacZ+ clone across 8 Sox9<sup>Δ/Δ</sup>-LacZ eyes. Based on prior reports of ligand-independent Cre activity (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009; Dorà et al., 2015), we believe this rare event likely resulted from spontaneous recombination in a bona fide limbal stem cell, independent of tamoxifen administration. For this reason, we do not expect that increasing the dose or duration of tamoxifen would eliminate such rare events. Furthermore, due to the mosaic and highly variable recombination efficiency of the CAGG-CreERTM system in the adult eye (see McMahon et al., 2008), attempting to increase the TX dosage would likely lead to systemic toxicity or lethality, without guaranteeing full inactivation of the gene in the limbus. Thus, this system is not well-suited for generating a fully Sox9-deficient limbal epithelium. To overcome this limitation, we crossed our mice with the R26R-LacZ reporter line to track the clonal behavior of Sox9-deficient cells. In control animals (Sox9Δ/+-LacZ), LacZ+ stripes originating from limbal stem cells are readily observed. In contrast, in Sox9Δ/Δ-LacZ mutants, these clones are either absent or drastically reduced. This suggests that Sox9-null cells have a severely impaired ability to form and sustain clones. To rigorously quantify this effect, we compared 8 control and 5 mutant corneas, revealing a highly significant 8-fold reduction in LacZ-positive area in the mutants (6.65 ± 1.77% vs. 0.85 ± 0.85%; p = 0.00017; Fig. 6F; Table S12; Supp. Fig. X???), supporting our claim that Sox9null cells cannot survive or proliferate as well as their wild-type neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea.

      Minor points

      (1) Quantification for Figure 2C and 2D is missing.

      We have now included quantification of BRN3A+ retinal ganglion cells (Figure 2E) across control and Sox9<sup>Δ/Δ</sup> retinas. Cell counts were performed on matched retinal sections, and the difference between groups was found to be statistically significant through Mann–Whitney U test (Table S5).

      Regarding PAX6/AP2a, we quantified inner retinal neurons by analyzing AP2α+ amacrine cells and PAX6+/AP2α- horizontal cells as distinct subpopulations, rather than simply comparing total PAX6 or AP2α immunoreactivity. This approach allowed us to better resolve specific neuronal subtype changes. Both populations showed a statistically significant reduction in Sox9-deficient retinas relative to controls. The quantification for these analyses has now been incorporated into the revised Figure 2F and G (Table S6).

      Consequently with the above, the following paragraph of the Results section (line 210 of the revised ms:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Fig. 2C). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells, as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Fig. 2D).”

      Has been modified as follows:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Figs. 2C and 2D and Table S5; n = 5 controls, n = 12 mutants; Mann-Whitney U test, P = 3 × 10<sup>-4</sup>). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells (Fig. 2E), as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Figs. 2F and 2G and Table S6; AP2α+ amacrine cells: n = 3 controls, n = 8 mutants;  2-sample T-tests P = 0.029; PAX6+/AP2α− horizontal cells: n = 3 controls, n = 8 mutants; Mann-Whitney U test P = 0.021). These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia.

      (2) Figure 4G & H: The authors must mention that the dashed lines enclose the limbal area.

      Done

      (3) The authors infer from an existing scRNA-seq dataset that OLB cells have high Sox9 expression as compared to ILB and corneal populations. However, Figures 4A and B do not indicate the anatomical positions of these cell types. The authors must label these for the reader's reference as they state that "[Sox9] expression pattern is consistent with our immunofluorescence observations" in Line 280.

      As previously indicated, we have generated a new panel 5F and a corresponding paragraph to illustrate Sox9 expression pattern in the limbus. Accordingly, we have removed the sentence from line 280.

      (4) Quantification for Figures 6A and 6B is missing.

      We have quantified the number of Sox9 and P63 positive cells in the limbus between mutant and control corneas and found no difference in the number of positive cells. We have included these data in new panel 6C and Table S11.

      Reviewer #2 (Recommendations for the authors):

      Line 24: "synapsis" should be "synapses".

      Done

      (1) Consider starting a new paragraph after line 30.

      Done

      (2) Lines 42-48: make clear that this paragraph provides information only for HUMAN SOX9.

      We now distinguish which studies were conducted in humans and which in mice.

      (3) Line 55: explain to the non-expert reader what the "visual cycle" is.

      Done (lines 64-65 of the revised ms)

      (4) Line 66: consider "inactivate" instead of "suppress".

      We substituted “suppress” with “inactivate”

      (5) Line 90-92: ONLY PCR for the cGMP will provide formal evidence that this is not present in the mouse line.

      We agree with the reviewer that PCR genotyping is the most straightforward method to exclude the presence of the Pde6<sup>brd</sup>1 allele. Although retinal degeneration was never observed in untreated or control animals in our study, we have now removed the term “formal possibility” from the text to better reflect this limitation.

      We have modified the following paragraph (page 116 in the revised version of the manuscript): “Retinal degeneration was never observed in mice that had not been tamoxifen-treated, nor any other controls, eliminating the formal possibility that the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6brd1) was present in our mice (Bowes et al., 1990).”

      As follows: “Retinal degeneration was never observed in mice that had not been tamoxifentreated, nor any other control groups, making the presence of the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6<sup>brd1</sup>) unlikely in our mice (Bowes et al., 1990). However, we acknowledge that definitive exclusion of this possibility would require PCR-based genotyping.”

      (6) Line 160-166: This paragraph needs a conclusion.

      We agree with the reviewer and have added the following sentence at the end of the paragraph:

      “These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia”

      (7) Line: 240-265: This paragraph ends without a conclusion.

      We have include the following conclusion:

      “Thus, Sox9 is expressed in a basal limbal stem cell population with the ability to form two types of long-lived cell clones involved in stem cell maintenance and homeostasis.”

      (8) In Results, it needs to be specified when exactly in adulthood the tamoxifen treatment started. This information is only provided in the Methods.

      We have specified the age of the mice at the onset of tamoxifen treatment (two months)  and included it in the schemes of Figs 1A, 4C, 4H, 6E.

      (9) Line 250: Because live imaging is not conducted, the word "dynamics" is not suitable.

      We substituted “dynamics” with “contribution”

      (10) Panel C in Figure 6 is nice and helpful. Consider adding a similar panel in Figure 1.

      Done.

      (11) Line 420: is this the human Sox9 enhancer?

      Yes. It is a human enhancer. We have indicated it in the text.

      (12) Line 459: typo "detected tissue".

      Corrected

      (13) Line 448 and 468: citations are needed.

      Line 448 is deleted in the revised version of the ms.

      (14) 479: typo "clones clones'.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work computationally characterized the threat-reward learning behavior of mice in a  recent study (Akiti et al.), which had prominent individual differences. The authors  constructed a Bayes-adaptive Markov decision process model and fitted the behavioral data  by the model. The model assumed (i) hazard function starting from a prior (with free mean  and SD parameters) and updated in a Bayesian manner through experience (actually no real  threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future  outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic  exploration bonus. The authors found that (i) brave animals had more widespread hazard  priors than timid animals and thereby quickly learned that there was in fact little real threat,  (ii) brave animals may also be less risk-aversive than timid animals in future outcome  evaluation, and (iii) the exploration bonus could explain the observed behavioral features,  including the transition of behavior from the peak to steady-state frequency of bout. Overall,  this work is a novel interesting analysis of threat-reward learning, and provides useful  insights for future experimental and theoretical work. However, there are several issues that I  think need to be addressed.

      Strengths:

      (1) This work provides a normative Bayesian account for individual differences in  braveness/timidity in reward-threat learning behavior, which complements the analysis by  Akiti et al. based on model-free threat reinforcement learning.

      (2) Specifically, the individual differences were characterized by (i) the difference in the  variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in the  evaluation of future returns.

      Weakness:

      (1) Theoretically the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, but these two effects could not be teased apart in the  fitting analysis of the current data.

      (2) It is currently unclear how (whether) the proposed model corresponds to neurobiological ( rather than behavioral) findings, different from the analysis by Akiti et al.

      We thank reviewer #1 for their useful feedback which we’ve used to improve the discussion,  formatting and clarity of the paper, and for highlighting important questions for future  extensions of our work.

      Major points:

      (1) Line 219

      It was assumed that the exploration bonus was replenished at a steady rate when the animal  was at the nest. An alternative way would be assuming that the exploration bonus slowly  degraded over time or experience, and if doing so, there appears to be a possibility that the  transition of the bout rate from peak to steady-state could be at least partially explained by  such a decrease in the exploration bonus.

      Section 2.2.3 explains the mechanism of the exploration bonus which motivates approach.  We think that the mechanism suggested by the reviewer is, in essence, what is happening in  the model. The exploration pool is indeed depleted over time or bouts of experience at the  object. In the peak confident phase for brave animals and the peak cautious phase for timid  animals, the rate of depletion exceeds the rate of regeneration, since the agent spends only  a single turn at the nest between bouts. In the steady-state phase, the exploration pool has  depleted so much previously that the agent must wait multiple turns at the nest for the pool  to regenerate to a sufficiently high value to justify approaching the object again.

      We have updated section 2.2.3 to explain that agents spend one turn at the nest during peak  phase but multiple turns during steady-state phase. Hopefully, this makes our mechanism  clear:

      “In simulations, when 𝐺(𝑡) is high, the agent has a high motivation to explore the object,  spending only a single turn in the nest state between bouts. In other words, the depletion  from 𝐺0 substantially influences the time point at which approach makes a transition from  peak to steady-state; the steady-state time then depends on the dynamics of depletion  (when at the object) and replenishment (when at the nest). In particular, in the steady-state  phases, the agent must wait multiple turns at the nest for 𝐺(𝑡)  to regenerate so that  informational reward once again exceeds the potential cost of hazard.“

      (2) Line 237- (Section 2.2.6, 2.2.7, Figures 7, 9)

      I was confused by the descriptions about nCVaR. I looked at the cited original literature  Gagne & Dayan 2022, and understood that nCVaR is a risk-sensitive version of expected  future returns (equation 4) with parameter α (α-bar) (ranging from 0 to 1) representing risk  preference. Line 269-271 and Section 4.2 of the present manuscript described (in my  understanding) that α was a parameter of the model. Then, isn't it more natural to report  estimated values of α, rather than nCVaR, for individual animals in Section 2.2.6, 2.2.7,  Figures 7, 9 (even though nCVaR monotonically depends on α)? In Figures 7 and 9, nCVaR  appears to be upper-bounded to 1. The upper limit of α is 1 by definition, but I have no idea why nCVaR was also bounded by 1. So I would like to ask the authors to add more detailed  explanations on nCVaR. Currently, CVaR is explained in Lines 237-243, but actually, there is  no explanation about nCVaR rather than its formal name 'nested conditional value at risk' in  Line 237.

      Thank you for pointing out this error. We have corrected the paper to use nCVaR to refer to  the objective and nCVaR's α, or sometimes just α, to refer to the risk sensitivity parameter  and thus the degree of risk sensitivity.

      (3) Line 333 (and Abstract)

      Given that animals' behaviors could be equally well fitted by the model having both nCVaR ( free α) and hazard prior and the alternative model having only hazard prior (with α = 1), may  it be difficult to confidently claim that brave (/timid) animals had risk-neutral (/risk-aversive)  preference in addition to widespread (/low-variance) hazard prior? Then, it might be good to  somewhat weaken the corresponding expression in the Abstract (e.g., add 'potentially also'  to the result for risk sensitivity) or mention the inseparability of risk sensitivity and prior belief  pessimism (e.g., "... although risk sensitivity and prior belief pessimism could not be teased  apart").

      Thank you for this suggestion, we have duly weakened the wording in the Abstract to say  “potentially more risk neutral”:

      “Some animals begin with cautious exploration, and quickly transition to confident approach  to maximize exploration for reward; we classify them as potentially more risk neutral, and  enjoying a flexible hazard prior. By contrast, other animals only ever approach in a cautious  manner and display a form of  self-censoring; they are characterized by potential risk  aversion and high and inflexible hazard priors.”

      Reviewer #2 (Public Review):

      Shen and Dayan build a Bayes adaptive Markov decision process model with three key  components: an adaptive hazard function capturing potential predation, an intrinsic reward  function providing the urge to explore, and a conditional value at risk (CvaR, closely related  to probability distortion explanations of risk traits). The model itself is very interesting and  has many strengths including considering different sources of risk preference in generating  behavior under uncertainty. I think this model will be useful to consider for those studying  approach/avoid behaviors in dynamic contexts.

      The authors argue that the model explains behavior in a very simple and unconstrained  behavioral task in which animals are shown novel objects and retreat from them in various  manners (different body postures and patterns of motor chunks/syllables). The model itself  does capture lots of the key mouse behavioral variability (at least on average on a  mouse-by-mouse basis) which is interesting and potentially useful. However, the variables in  the model - and the internal states it implies the mice have during the behavior - are  relatively unconstrained given the wide range of explanations one can offer for the mouse  behavior in the original study (Akiti et al). This reviewer commends the authors on an original  and innovative expansion of existing models of animal behaviour, but recommends that the  authors  revise their study to reflect the obvious  challenges . I would also recommend a  reduction in claiming that this exercise gives a normative-like or at least quantitative account  of mental disorders.

      We thank reviewer #2 for highlighting some of the strengths of our paper as well as pointing  out important limitations of Akiti et al’s original study which we’ve inherited as well as some  limitations of our own method. We address their concerns below.

      We have added a paragraph to the discussion discussing the limitations of the state  representation we adopted from Akiti’s study.

      (Reviewer #1 had the same concern, see above) “Motivated by tail-behind versus  tail-exposed in Akiti et al. (2022), we model approach using a dichotomy between cautious  and confident approach states [...]”

      We have reduced the suggestion that our model provides an account of mental disorders in  the abstract.

      Before:

      “On the other hand, “timid” animals, characterized by risk aversion and high and inflexible  hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive  behavior that is often associated with psychiatric illnesses such as anxiety and depression.”

      After:

      “By contrast, other animals only ever approach in a cautious manner and display a form of  self-censoring; they are characterized by potential risk aversion and high and inflexible  hazard priors. “

      My main comment is that this paper is a very nice model creation that can characterize the  heterogeneity rodent behavior in a very simple approach/avoid context (Akiti et al; when a  novel object is placed in an arena) that itself can be interpreted in a multitude of ways. The  use of terms like "exploration", "brave", etc in this context is tricky because the task does not  allow the original authors (Akiti et al) to quantify these "internal states" or "traits" with the  appropriate level of quantitative detail to say whether this model is correct or not in capturing  the internal states that result in the rodent behavior. That said, the original behavioral setup  is so simple that one could imagine capturing the behavioral variability in multiple ways ( potentially without evoking complex computations that the original authors never showed  the mouse brain performs). I would recommend reframing the paper as a new model that  proposes a set of internal states that could give rise to the behavioral heterogeneity  observed in Akiti et al, but nonetheless is at this time only a hypothesis. Furthermore, an  explanation of what would be really required to test this would be appreciated to make the  point clearer.

      We thought very hard about using terms that might be considered to be anthropomorphic  such as ‘timid’ and ‘brave’. We are, of course, aware, of the concerns articulated by  investigators such as LeDoux about this. However, we think that, provided that we are clear  on the first appearance (using ‘scare’ quotes) that we are using them as indeed labels for  latent characteristics that capture correlations in various aspects of behaviour, they are more  helpful than harmful in making our descriptions understandable.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during  encounters with novel and familiar objects, originally reported by Akiti et al. (Neuron 110, 2022)          . Mice typically perform short bouts of approach followed by a retreat to a safe  distance, presumably to balance exploration to discover possible rewards with the potential  risk of predation. However, there is considerable heterogeneity in this exploratory behaviour,  both across time as an individual subject becomes more confident in approaching the object,  and across subjects; with some mice rapidly becoming confident to closely explore the  object, while other timid mice never become fully confident that the object is safe. The  current work aims to explain both the dynamics of adaptation of individual animals over time,  and the quantitative and qualitative differences in behaviour between subjects, by modelling  their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision  Process (BAMDP) framework, in which the subjects maintain and update probabilistic  estimates of the uncertain hazard presented by the object, and rationally balance the  potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make  substantial simplifying assumptions, including coarse-graining the exploratory behaviour into  phases quantified by a set of summary statistics related to the approach bouts of the animal.  Inter-individual variation between subjects is modelled both by differences in their prior  beliefs about the possible hazard presented by the object and by differences in their risk  preference, modelled using a conditional value at risk (CVaR) objective, which focuses the  subject's evaluation on different quantiles of the expected distribution of outcomes.  Interestingly these two conceptually different possible sources of inter-subject variation in  brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as  they can largely compensate for each other in their effects on the measured behaviour.  Nonetheless, the modelling captures a wide range of quantitative and qualitative differences  between subjects in the dynamics of how they explore the object, essentially through  differences in how subject's beliefs about the potential risk and reward presented by the  object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced  by organisms, with strong clinical relevance, yet remains poorly understood and  under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      (1) Individual differences in exploratory behaviour are an interesting, important, and  under-studied topic.

      (2) Application of cutting-edge modelling methods to a rich behavioural dataset, successfully  accounting for diverse qualitative and qualitative features of the data in a normative  framework.

      (3) Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      (1) The model-fitting approach used of coarse-graining the behaviour into phases and fitting  to their summary statistics may not be applicable to exploratory behaviours in more complex  environments where coarse-graining is less straightforward.

      (2) Some aspects of the work could be more usefully clarified within the manuscript.

      We thank reviewer #3 for their positive feedback and helping us to improve the clarity of our  paper. We have added discussion they thought was missing.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25-28

      This part of the Abstract might give an impression that timidity (but not braveness) is  potentially associated with psychiatric illness and even that timidity is thus inferior to  braveness. However, even though extreme timidity might indeed be associated with anxiety  or depression, extreme braveness could also be associated with other psychiatric or  behavioral problems. Moreover, as a population, the existence of both timid and brave  individuals could be advantageous, and it could be a reason why both types of individuals  evolutionarily survived in the case of wild animals (although Akiti et al. used mice, which may  have no or very limited genetic varieties, and so things may be different). So I would like to  encourage the authors to elaborate on the expression of this part of the Abstract and/or  enrich the related discussion in the Discussion.

      This is an important point. We note on line 38 that excessive novelty seeking (potentially  caused by excessive braveness) could also be maladaptive.

      Additionally, we have added a paragraph to the discussion discussing heterogeneity in risk  sensitivity within a population.

      “Our data show that there is substantial variation in the degrees of risk sensitivity across the  mice.  Previous works have reported substantial interpopulation and intrapopulation  differences in risk-sensitivity in humans which depend on gender, age, socioeconomic  status, personality characteristics, wealth and culture (Rieger et al., 2015; Frey et al., 2017).  Despite the normative appeal of 𝛼 = 1, it is possible that a population may benefit from  including individuals with $\alpha$ different from 1.0 or highly negative priors. For example,  more cautious individuals could learn from merely observing the risky behavior of less  cautious individuals. Furthermore, we have only considered risk-sensitivity under epistemic  uncertainty in our work. Risk averse individuals, for instance with 𝛼 < 1 may be more  successful than risk-neutral agents in environments where there are unexpected dangers ( unknown unknowns). Risk-aversion is thus a temperament of ecological and evolutionary  significance (Réale et al., 2007).”

      (2) Line 149

      Section 2.2 consists of eight subsections. I think this organization may not be very  appealing, because there are a bit too many subsections, and their relations are not  immediately clear to readers. So I would like to encourage the authors to make an  elaboration. For example, since 2.2.1 - 2.2.5 describes a summary of model construction  and model fitting whereas 2.2.6-2.2.8 shows the results, it could be good to divide these into  separate sections (2.2.1 - 2.2.5 and 2.3.1 - 2.3.3).

      Thank you for pointing this out. We’ve renumbered the sections as you’ve suggested.

      (3) Line 347-8

      Theoretically, the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, as the authors mentioned in Lines 393-394. Then isn't it  possible to consider environments/conditions in which the two effects can be separated?

      We appreciate this suggestion. Indeed, our original thought in modeling this experiment was  that this would be exactly the case here - with epistemic uncertainty reducing as the object  became more familiar. However, proving to an animal that a single environment is  completely stationary/fixed is hard - reflected in our conclusion here that the exploration  bonus pool replenishes. Thus, we argued in the discussion that a series of environments  would be necessary to separate risk sensitivity from priors.

      (4) Line 407

      It would be nice to add a brief phrase explaining how (in what sense) this model's  assumption was consistent with the reported behavior. Also, should the assumption of  having two discrete approach states (cautious and confident) itself be regarded as a  limitation of the model? If the tail-behind and tail-exposure approaches were not merely  operationally categorized but were indicated to be two qualitatively distinct behaviors in the  experiment by Akiti et al., it is reasonable to model them as two discrete states, but  otherwise, the assumption of two discrete states would need to be mentioned as a  simplification/limitation.

      We have now removed line 407, and now have an additional  paragraph in the discussion  discussing the limitations of the tail-behind and tail-exposure state representation: “Motivated by tail-behind versus tail-exposed in Akiti et al. (2022), we model approach using  a dichotomy between cautious and confident approach states. This is likely a crude  approximation to the continuous and multifaceted nature of animal approach behavior. For  example, during approach animals likely adjust their levels of vigilance continuously (or  discretely; Lloyd and Dayan (2018)) to  monitor threat, and choose different velocities for  movement, and different attentional strategies for inspecting the novel object. We hope  future works will model these additional behavioral complexities, perhaps with additional  internal states, and corroborate these states with neurobiological data.”

      (5) Line 418

      The authors contrasted their model-based analyses with the model-free analyses of Akiti et  al. Another aspect of differences between the authors' model and the model of Akiti et al. is  whether it is normative or mechanistic: while how the model of Akiti et al. can be biologically  implemented appears to be clear (TS dopamine represents threat TD error, and TS  dopamine-dependent cortico-striatal plasticity implements TD error-based update of  model-free threat prediction), biological implementation of the authors' model seems more  elusive. Given this, it might be a fruitful direction to explore how these two models can be  integrated in the future.

      We enthusiastically agree that it would be most interesting in the future to explore the  integration of the two models - and, in the discussion ( Lines 537-548, 454-461) , point to  some first steps that might be fruitful along these lines. There are two separate  considerations here: one is that our account is mostly computational and algorithmic,  whereas Akiti’s model is mostly algorithmic and implementational; the second is, as noted by  the reviewer, that our account is model-based, whereas Akiti’s model is model-free (in the  sense of reinforcement learning; RL). These are related - thanks in no small part to the work  from the group including Akiti, we know a lot more about the implementation of model-free  than model-based RL. However, our model-based account does reach additional features of  behavior not captured in Akiti et al.’s model such as bout duration, frequency, and approach  type. Thus, the temptation of unification.

      (6) Line 426

      Related to the previous point, it would be nice to more specifically describe what variable TS  dopamine can represent in the authors' model if possible.

      In the discussion  (Lines 454-461) , we speculate that  TS dopamine could still respond to the  physical salience of the novel object and affect choices by determining the potential cost of  the encountered threat or the prior on the hazard function. For example, perhaps ablating TS  dopamine reduces the hazard priors which leads to faster transition from cautious to  confident approach and longer bout durations, consistent with the optogenetics behavioral  data reported in Akiti et al.

      Reviewer #2 (Recommendations for the authors):

      My guess is simpler versions of the model would not fit the data well. But this does not mean  for example that the mice have probability distortions (CvaR) or that even probabilistic  reasoning and the internal models necessary to support them are acting in the behavioral  context studied by Akiti. So related to the above, I would ask what other models would fit and  would not fit the data? And what does this mean?

      These are good points. Our model provides an approximately normative account of the  animals’ behavior  in terms of what it achieves relative to a utility function. In practice, the  animals could deploy a precompiled model-free policy (which does not rely on probabilistic  computations) that is exactly equivalent to our model-based policy. With the current  experiment, we cannot conclude whether or not the animals are performing the prospective  calculations in an online manner. Of course, the extent to which animals or humans are  performing probabilistic computations online and have internal models are on-going  questions of study.

      Model comparison is difficult because currently we do not know of any other risk-sensitive  exploration models. We cannot directly compare to the model in Akiti et al. since our model  explains additional features of behavior: bout duration, frequency, and approach type.  Indeed, our model is as simple as it can be in the sense with the exception of nCVaR,  removing any of the other parameters makes it difficult to fit some animals in our dataset. In the future, our model could be used to fit other datasets of risk-sensitive exploration and,  ideally,  be compared to other models.

      Explaining why animals avoid the novel object in what the offers call benign environment is a  very tricky issue. In Akiti et al, the readers are not yet convinced that the mice know that this  environment is benign. Being placed in an arena with a novel object presents mice with a  great uncertainty and we do not know whether they treat this as benign. Therefore, the  alternative explanations in this study need to be carefully discussed in lieu of the limitations  of the initial study.

      It is certainly true that it is unclear if the arena is  completely  benign to the animals. However,  the amount of time the animal spends in the center of the arena decreases significantly from  habituation to novelty days. This suggests that the animals avoid the novel object largely  because of the object itself, rather than the potential danger associated with the arena.  Furthermore, the animals are not reported as exhibiting more extreme behaviours such as  freezing. In any case, our account is relative in the sense that we are comparing the time the  animal spends at the object versus elsewhere in the environment, driven by the relative  novelty and relative risk of the environment versus the object. Trying to get more absolute  measures of these quantities would require a richer experimental set-up, for instance with  different degree of habituation or experience of the occurrence of (other) novel objects, in  general.

      We added a short note to the discussion to explain this:

      “Fourth, we modeled the relative amount of time the animal spends at the object versus  elsewhere in the environment which depends on the differential risk in the two states.  However, it is likely the animals avoid the novel object largely because of the object itself,  rather than the potential danger associated with the arena since they spend much less time  at the center of the arena during novelty than habituation days.”

      Figure 2 - how confident are the authors that each mouse differs from y=1? Related to this,  the behavior in Akiti is very noisy and changes across time. I am not sure if the authors fully  describe at what levels their model captures the behavior vs not in a detailed enough  fashion.

      We have performed a random permutation test on the minute-to-minute data. We have  updated Figure 2 so that brave animals that pass the Benjamini–Hochberg procedure y>1 at  level q=0.05 are represented with solid green dots and animals that don’t pass are  represented with hollow dots. 8 out of 11 brave animals passed Benjamini–Hochberg.

      Reviewer #3 (Recommendations for the authors):

      (1) I could not find information in the preprint about code availability. Please consider making  the code public to help others apply these modelling methods.

      We have released code and included the url in the paper in the Methods section.

      (2) Though the manuscript was generally clearly written, there were a number of places  where some additional information or clarification would be useful:

      a) Please define and explain the terms 'tail-behind' and 'tail-exposed' (used to describe  approach bout types) when first used.

      We have added definitions when we first mention these terms:

      “[...] 'tail-behind' (bouts where the animal's nose was closer to the object than the tail for the  entire bout) and 'tail-exposed' (bouts where the animal's tail is closer to the object than the  nose at some point during the bout), associated respectively with cautious risk-assessment  and engagement”

      b) At lines 57-58 when contrasting the 'model-free' account of Akiti et al with the 'model-based' account of the current work, it would be worth clarifying that these terms are  being used in the RL sense rather than e.g. a model-based analysis of the data.  

      We have updated the relevant lines to say “model-free/based reinforcement learning”.

      c) Line 61, the phrase 'the significant long-run approach of timid animals despite having  reached the "avoid" state' is unclear as the 'avoid' state has not been defined.

      We updated the terminology to “avoidance behavior” to be consistent with Akiti et al.  Avoidance refers to the animal routinely avoiding the object and therefore being unable to  learn whether it is safe.

      d) It was not completely clear to me how the coarse-graining of the behaviour was  implemented. Specifically, how were animals assigned to the brave, intermediate, or timid  group, and how were the parameters of the resulting behavioural phases fit?

      Sorry that this was not clear. Section 2.1 explains how the minute-to-minute behavioral data  was coarse-grained and how animal groups were assigned. We have added further  explanation of Figure 2 to the main text:

      “Fig 2 summarizes our categorization of the animals into the three groups: brave,  intermediate, and timid based on the phases identified in the animal's exploratory  trajectories. Timid animals spend no time in confident approach and are plotted in orange at  the origin of Fig 2. Brave animals differ from intermediate animals in that their approach time  during the first ten minutes of the confident phase is greater than the last ten minutes ( steady-state phase). Brave animals are plotted in green above and intermediate animals  are plotted in black below the y=1 line in Fig 2.”

      We also added extra information to outline the goal, and methodology of coarse-graining and  animal grouping:

      “We sought to capture  these qualitative differences (cautious versus confident) as well as  aspects of the quantitative changes in bout durations and frequencies as the animal learns  about their environment. To make this readily possible, we abstracted the data in two ways:

      averaging  bout statistics over time, and clustering the animals into three groups with  operationally distinct behaviors.”

      e) What purpose does the 'retreat' state serve in the BAMDP model (as opposed to  transitioning directly from 'object' to 'nest' states), and why do subjects not pass through it  following 'detect' states?

      Thank you for pointing this out. We have updated Figure 3 to note that the two “detected  states” also point to the “retreat” state. The reviewer is correct that there could be alternative  versions of the state diagram, and the ‘retreat’ state could indeed have been eliminated.  However, we thought that it was helpful to structure the animal’s progress through state  space.

      f) Why was the hazard function parameterised via the mean and SD at each time step rather  than with a parametric form of the mean and SD as a function of time?

      Since the agent can only spend 2, 3, or 4 turns at the object states, we didn’t see a need to  parameterize the mean and SD as a function of time. Doing so is a good solution to scaling  up the hazard function to more time-steps.

      (3) There were also a couple of points that could potentially be usefully touched on in the  discussion:

      a) What, if any, is the relationship between the CVaR objective and distributional RL? They  seem potentially related due to both focussing on quantiles of the outcome distribution.

      We have added a paragraph to the discussion discussing the connection between  distributional RL and CVaR:

      “CVaR is known to come in different flavors in the case of temporally-extended behavior.  Gagne and Dayan (2021) introduces two alternative time-consistent formulations of CVaR:  nested CVaR (nCVaR) and precommitted CVaR (pCVaR). nCVaR and pCVaR both enjoy  Bellman equations which make it possible to compute approximately optimal policies without  directly computing whole distributions of the outcomes. We use nCVaR in this study for its  computational efficiency. There is, of course, great current interest in distributional  reinforcement learning (Bellemare et al., 2023b) which does acquire such whole  distributions, not the least because of prominent observations linking non-linearities in the  response functions of dopamine neurons to methods for learning distributions of outcomes ( Dabney et al., 2020; Masset et al., 2023; Sousa et al., 2023). One functional motivation for  considering entire outcome distributions is the possibility of using them to determine  risk-sensitive policies (Gagne and Dayan, 2021).

      While it is possible to compute CVaR directly from return distributions, Gagne and Dayan  (2021) showed that this can lead to temporally inconsistent policies where the agent  deviates from its original plans (the authors called this the fixed CVaR or fCVaR measure).

      Rather further removed from our model-based methods is work from Antonov and Dayan  (2023), who consider a model-free exploration strategy which exploits full return distributions  to compute the value of perfect information which is used as a heuristic for trying actions  with uncertain consequences. Future works can examine risk-sensitive versions of Antonov  and Dayan (2023)'s computationally efficient model-free algorithm as one solution to the  burdensome computations in our model-based method.”

      b) Why normatively might subjects have non-neutral risk preference as captured by the  CvaR?

      We also added a paragraph to the discussion discussing the advantage of heterogeneity in  risk sensitivity within a population:

      (Reviewer #1 had the same question, see above) “Our data show that there is substantial  variation in the degrees of risk sensitivity across the mice.  Previous works have reported  substantial interpopulation and intrapopulation differences in risk-sensitivity in humans which  depend on gender, age, socioeconomic status, personality characteristics, wealth and culture [...]”

      c) Relevance of the current modelling work to clinical conditions characterised by  dysregulation of risk assesment (e.g. anxiety or PTSD).

      We’ve added a paragraph to the discussion:

      “Inter-individual differences in risk sensitivity are also of critical importance in psychiatry,  reflected in a panoply of anxiety disorders (Butler and Mathews, 1983; Giorgetta et al., 2012;  Maner et al., 2007; Charpentier et al., 2017), along with worry and rumination (Gagne and  Dayan, 2022). Understanding the spectrum of   extreme priors and extreme values of 𝛼  could have therapeutic implications, adding significance to the search for tasks that can  more cleanly separate them.”

      d) Is it surprising to see differences in risk preference (nCVaR) between the familiar object  and novel object condition, given that risk preference might be conceptualised as a trait  rather than a state variable?

      Thank you for raising this point. You are right that we expected risk sensitivity (nCVaR alpha)  to be the same between FONC and UONC animals on average. It is difficult to know if alpha  is higher for FONC than UONC animals due to the non-identifiability between alpha and  hazard priors. We have added this discussion to the paper:

      “This is surprising if we interpret 𝛼 as a trait that is stable through time. Unfortunately, due to  the non-identifiability between 𝛼 and hazard priors, we cannot verify whether 𝛼 is actually  higher for FONC animals than UONC animals.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Response to Reviewer #3:

      We thank reviewer 3 for spending their valuable time on commenting on our revised paper.

      We would like to reiterate the central conclusion of this work, which appears to have been missed by Reviewer 3. Using a BFP-expressing lineage tracer hPSC line for tracking LMX1A+ midbrain-patterned neural progenitors and their differentiated progeny, we discovered a loss of the LMX1A lineage during pluripotent stem cell differentiation into astrocytes, despite BFP+ neural progenitors were the dominant population at the onset of astrocyte induction.

      Hence, the take-home message of this study is, as summarized in the abstract, ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell-derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to reviewer comments. We have re-ordered the points according to their relative importance to our main conclusions.

      ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to their comments. We have re-ordered the points according to their relative importance to our main conclusions.

      …. They used lineage tracing with a LMX1A-Cre/AAVS1-BFP iPSCs line, where the initial expression of LMX1A and Cre allows the long-lasting expression of BFP, yielding BFP+ and BFP- populations, that were sorted when in the astrocytic progenitor expansion. BFP+ showed significantly higher number of cells positive to NFIA and SOX9 than BFP- cells …

      This is a misunderstanding by reviewer 3. As indicated in the first sentence of the second section, BFP- populations used for functional and transcriptomic analysis was not sorted BFP<sup>-</sup> cells, but those derived from unsorted, BFP<sup>+</sup> enriched populations. Our scRNAseq analysis indicated that they were transcriptomically aligned to human midbrain astrocytes. This finding is consistent with the fact that they are derived from midbrain-patterned neural progenitors, presumably minority LMX1A- progenitors.

      Reviewer 3’s comments indicate that they misunderstood the primary aims of our study as a mere functional and transcriptomic comparison of the two astrocyte populations.

      (9) BFP+ cells did not show higher levels of transcripts for LMX1A nor FOXA2. This fact jeopardizes the claim that these cells are still patterned. In the same line, there are not significant differences with cortical astrocytes, indicating a wider repertoire of the initially patterned cells, that seems to lose the midbrain phenotype. Furthermore, common DGE shared by BFP- and BFP+ cells when compared to non-patterned cells indicate that after culture, the pre-pattern in BFP+ cells is somehow lost, and coincides with the progression of BFP- cells.

      The reviewer seems to assume that astrocytes derived from LMX1A+ ventral midbrain progenitors must retain LMX1A expression. We do not take this view and do not claim this in this study. Moreover, we have discussed in the paper that due to a lack of transcriptomic studies of in vivo track regional progenitors (such as LMX1A), it remains unknown whether and to what extent patterning gene expression is maintained in astrocytes of different brain regions.

      Our findings on the lack of LMX1A and FOXA2 in BFP+ astrocytes are supported by several published single-cell transcriptomic studies of human midbrain astrocytes (La Manno et al. 2016; Agarwal et al. 2020; Kamath et al. 2022). We have a paragraph of discussion on this topic in both the original and updated versions of the paper with the relevant publications cited.

      Other points raised by reviewer 3

      (1) It is very intriguing that GFAP is not expressed in late BFP- nor in BFP+ cultures, when authors designated them as mature astrocytes.

      We did not designate our cells as ‘mature’ astrocytes but ‘astrocytes’ based on their global gene expression with the human fetal and adult brain astrocytes as references.

      Moreover, ‘mature’ only appeared once in the paper indicating that our cells lie in between the fetal and adult astrocytes in maturity.

      (2) In Fig. 2D, authors need to change the designation "% of positive nuclei".

      To be corrected in the version of record.

      (3) In Fig. 2E, the text describes a decrease caused by 2APB on the rise elicited by ATP, but the graph shows an increase with ATP+2APB. However, in Fig. 2F, the peak amplitude for BFP+ cells is higher in ATP than in ATP+2APD, which is mentioned in the text, but this is inconsistent with the graph in 2E.

      To be corrected in the version of record.

      (4) The description of Results in the single-cell section is confusing, particularly in the sorted CD49 and unsorted cultures. Where do these cells come from? Are they BFP-, BFP+, unsorted for BFP, or non-patterned? Which are the "all three astrocyte populations"? A more complete description of the "iPSC-derived neurons" is required in this section to allow the reader to understand the type and maturation stage of neurons, and if they are patterned or not.

      As previously reported in the reference cited, CD49 is a novel human astrocyte marker. This is independent of BFP expression. For all three astrocyte populations studied here (BFP+, BFP-, and non-patterned astrocytes), we included both CD49f+ sorted and unsorted samples to account for selection bias caused by FACS. iPSC-derived neurons were included in the sequencing study to provide a reference for cell-type annotation. They were generated following a GABAergic neuron differentiation protocol. However, their maturation stages and/or regional characteristics are not relevant to astrocytes.

      (5) A puzzling fact is that both BFP- and BFP- cells have similar levels of LMX1A, as shown in Fig. S6F. How do authors explain this observation?

      This figure panel shows that LMX1A, LMX1B and FOXA2 are essentially NOT expressed in these astrocytes.

      (6) In Fig. 3B, the non-patterned cells cluster away from the BFP+ and BFP-; on the other hand, early and late BFP- are close and the same is true for early and late BFP+. A possible interpretation of these results is that patterned astrocytes have different paths for differentiation, compared to non-patterned cells. If that can be implied from these data, authors should discuss the alternative ways for astrocytes to differentiate.

      Both BFP+ and BFP- astrocyte are from ventral midbrain patterned neural progenitors, while non-patterned neural progenitors are more akin to that of forebrain. Figure 3B is expected and confirms the patterning effect.

      (7) Fig. 3D shows that cluster 9 is the only one with detectable and coincident expression of both S100B and GFAP expression. Please discuss why these widely-accepted astrocyte transcripts are not found in the other astrocytes clusters. Also, Sox9 is expressed in neurons, astrocyte precursors and astrocytes. Why is that?

      S100B and GFAP are classic astrocyte markers in certain states. We are not relying only on two markers but the genome-wide expression profile as the criteria for astrocytes. As shown in the unbiased reference mapping to multiple human brain astrocyte scRNA-seq datasets, all our astrocyte clusters were mapped with high confidence to human astrocytes.

      SOX9 is an important regulator for astrogenesis, so its expression is expected in precursors (doi.org/10.1016/j.neuron.2012.01.024). In addition, recent studies have uncovered that SOX9 expression is also reported in foetal striatal projection neurons and early postnatal cortical neurons, where SOX9 regulates neuronal synaptogenesis and morphogenesis (dois:10.1016/j.fmre.2024.02.019; 10.1016/j.neuron.2018.10.008). Therefore, the expression of SOX9 in multiple cell types was expected. Instead of using a few selected markers for cell-type annotation, we employed a genomic approach relying on an unbiased reference mapping approach and a combination of various markers to ascertain our annotation results.

      (8) Line 337, Why authors selected a log2 change of 0.25? Typically, 1 or a higher number is used to ensure at least a 2-fold increase, or a 50% decrease. A volcano plot generated by the comparison of BFP+ with BFP- cells would be appropriate. The validation of differences by immunocytochemistry, between BFP+ and BFP-, is inconclusive. The staining is blur in the images presented in Fig. S8C. Quantification of the positive cells, without significant background signal, in both populations is required.

      We used a lenient threshold owing to the following considerations: 1) High FC does not necessarily mean biological relevance, as gene expression does not necessarily translate to protein expression. Therefore, a smaller FC value could also be biologically meaningful. 2) Balance between noise and biological differences. Any threshold was chosen arbitrarily. 3) We are identifying a trend rather than pinpointing a specific set of

      The quality was unfortunately reduced due to restrictions on file size upon submission. A high resolution Fig. S8C is available.

      (10) For the GO analyses, How did authors select 1153 genes? The previous section mentioned 287 genes unique for BFP+ cells. The Results section should include a rationale for performing a wider search for the enriched processes.

      GO enrichment using unique DEGS may not capture the wider landscape of the transcriptomic characteristics of BFP<sup>+</sup> astrocytes. The 287 unique genes were only differentially expressed in BFP<sup>+</sup> astrocytes. However, apart from these 287 genes, other genes among the 1187 DEGs were differentially expressed in BFP<sup>+</sup> astrocytes and in one other population.

      (11) For Fig. 4C and 4D, both p values and the number of genes should be indicated in the graph. I would advise to select the 10 or 15 most significant categories, these panels are very difficult to read. Whereas the listed processes for BFP+ have a relation to Parkinson disease, the ones detected for BFP- cells are related to extracellular matrix and tissue development. Does it mean that BFP+ cells have impaired formation of this matrix, or defective tissue development? This is in contradiction of enhanced calcium responses of BFP+ cells compared to BFP- cells.

      Information on all DEGs, including p values and numbers, is provided in Supplementary data 1-5.

      BFP+ astrocytes do have enrichment for GO terms related to extracellular matrix and tissue development, although not as obvious as BFP- astrocytes. Previous work have shown that both in vitro and in vivo derived astrocytes are functionally heterogeneous, containing functionally distinct subtypes exhibiting different GO enrichment profiles (doi: 10.1016/j.ygeno.2021.01.008; 10.1038/s41598-024-74732-7).

      (12) Both the comparison between midbrain and cortical astrocytes in Fig. S8A, and the volcano plot in S8B do not show consistent changes. For example, RCAN2 in Fig. S8A has the same intensity for cortical and midbrain cells, but is marked as an enriched gene in midbrain in the p vs log2FC graph in Fig. S8B.

      These are integrated analyses of published human datasets. S8A and S8B show the same data in different formats. The differences are better shown in the volcano plot/easier detected by the human eye.

      These are integrated analysis of published human datasets. S8A and S8B are the same data shown in different format. Differences are better shown in volcano plot /easier detected by the human eye. RCAN2 had a higher average expression in the midbrain than in the telencephalon, albeit small, and the difference was statistically significant (as shown in the volcano plot).


      The following is the authors’ response to the original reviews

      Reviewer 1:

      In vitro nature of this work being the fundamental weakness of this paper

      We disagree with this statement. As explained in the provisional response, the aim of this study was to test the validity of a general concept applied in pluripotent stem cell research that pluripotent stem cell-derived astrocytes faithfully represent the lineage heterogeneity of their ancestral neural progenitors and hence preserve the regionality of such progenitors. Our genetic lineage study is justified for addressing this in vitro-driven question. However, we have highlighted the rationale where appropriate in the revised paper.

      If regional identity is not maintained, so what? Don't we already know that this can happen? The authors acknowledge that this is known in the discussion.

      Importance of regional identity: Growing evidence demonstrates the functional heterogeneity of brain astrocytes in health and disease. Therefore, for in vitro disease modeling, it is believed that one should use astrocytes represent the anatomy of disease pathology; for example, midbrain astrocytes for studying dopamine neurodegeneration and Parkinson’s disease. Understanding the dynamics of stem cell-derived astrocytes and identifying astrocyte subtypes is important for their biomedical applications.

      Regional identity change/Discussion: It seems that the reviewer misunderstood the context in which the ‘identity change’ was discussed. The literature referred to (in the Discussion) concerns shifts in regional gene expression in bulk-cultured cells. In the days of pre-single-cell analysis/lineage tracking, one cannot distinguish whether this was due to a change in the transcriptomic landscape in progenies of the same lineage or alterations in lineage heterogeneity, but to interpret at face value as regional identity was not maintained. In the revised paper, we have made an effort to indicate that ‘regional identity’ is used broadly to refer to lineage relationships and/or traits rather than static gene expressioin.

      validation of the markers/additional work

      The scNAseq analysis performed in this study compared the profiles of astrocytes derived from LMX1A+ and LMX1A- ventral midbrain-patterned neural progenitors. Since it is not possible to perform genetic lineage tracking in humans and an analogous mouse lineage tracer line is not available, in vivo validation of these markers with respect to their lineage relationship is not currently feasible. However, we took advantage of abundant single-cell human astrocyte transcriptomic datasets and validated our genes in silico. We also validated the differential expression of selected markers in late BFP+ and BFP- astrocytes using immunocytochemistry, where reliable antibodies are available. The results of the additional analyses are presented in Figure S8 and Supplemental Data 5.

      Knowledge gaps concerning astrocyte development

      Reviewer 1 pointed out a number of knowledge gaps concerning astrocyte development, such as the transcriptomic landscape trajectories of midbrain floor plate cells as they progress towards astrocytes. Indeed, the limited knowledge on regional astrocyte molecule heterogeneity restricts the objective validation of in vitro-derived astrocyte subtypes and the development of novel approaches for their generation in vitro. We agree with the need for in-depth in vivo studies using model organisms, although these are beyond the scope of the current work.

      Reviewer 2:

      (1) The authors argue that the depletion of BFP seen in the unsorted population immediately after the onset of astrogenic induction is due to the growth advantage of the derivatives of the residual LMX1A- population. However, no objective data supporting this idea is provided, and one could also hypothesize that the residual LMX1A- cells could affect the overall LMX1A expression in the culture through negative paracrine regulation.

      We acknowledge the lack of evidence-based explanation for the depletion of BFP+ cells in mixed cultures. We were unable to perform additional experiments because of resource limitations. The design of the LMX1A-Cre/AAVS1-BFP lineage tracer line determines that BFP is expressed irreversibly in LMX1A-expressing cells or their derivatives regardless of their LMX1A expression status. Therefore, the potential negative paracrine regulation of LMX1A by residual LMX1A- cells should not affect cells that have already turned on BFP. We have highlighted the working principles of the LMX1A tracer line in the revised manuscript.

      (2) Furthermore, on line 124 it is stated that: "Interestingly, the sorted BFP+ cells exhibited similar population growth rate to that of unsorted cultures...". In the face of the suggested growth disadvantage of those cells, this statement needs clarification.

      To avoid confusion, we have removed the statement.

      (3) Regarding the fidelity of the model system, it is not clear to me how the TagBFP expression was detected in the BFP+ population supposedly in d87 and d136 pooled astrocytes (Fig S6C) while no LMX1A expression was observed in the same cells (Fig S6F).

      The TagBFP tracer is expressed in the progenies of LMX1A+ cells, regardless of their LMX1A expression status. We have gone through the MS text to ensure that this information has been provided.

      (4) The generated single-cell RNASeq dataset is extremely valuable. However, given the number of conditions included in this study (i.e. early vs late astrocytes, BFP+ vs BFP-, sorted vs unsorted, plus non-patterned and neuronal samples) the resulting analysis lacks detail. For instance, from a developmental perspective and to better grasp the functional significance of astrocytic heterogeneity, it would be interesting to map the identified clusters to early vs late populations and to the BFP status.

      We performed additional bioinformatics analysis, which provided independent support for the relative developmental maturity suggested by functional assays. The additional data are now provided in the revised Figure 3B, C, E.

      Moreover, although comprehensive, Figure S7 is complex to understand given that citations rather than the reference populations are depicted.

      The information provided in the revised Figure S7.

      (5) Do the authors have any consideration regarding the morphology of the astrocytes obtained in this study? None of the late astrocyte images depict a prototypical stellate morphology, which is reported in many other studies involving the generation of iPSC-derived astrocytes and which is associated with the maturity status of the cell.

      The morphology of our astrocytes was not unique to the present study. Many factors may influence the morphology of astrocytes, such as the culture media and supplements used, and maturity status. Based on the functional assays and limited GFAP expression, our astrocytes were relatively immature.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study is methodologically solid and introduces a compelling regulatory model. However, several mechanistic aspects and interpretations require clarification or additional experimental support to strengthen the conclusions.

      Strengths:

      (1) The manuscript presents a compelling structural and biochemical analysis of human glutamine synthetase, offering novel insights into product-induced filamentation.

      (2) The combination of cryo-EM, mutational analysis, and molecular dynamics provides a multifaceted view of filament assembly and enzyme regulation.

      (3) The contrast between human and E. coli GS filamentation mechanisms highlights a potentially unique mode of metabolic feedback in higher organisms.

      Weaknesses:

      (1) The mechanism underlying spontaneous di-decamer formation in the absence of glutamine is insufficiently explored and lacks quantitative biophysical validation.

      (2) Claims of decamer-only behavior in mutants rely solely on negative-stain EM and are not supported by orthogonal solution-based methods.

      We thank the reviewer for the summary and noting of the strengths. We agree that the evolutionary divergence of metabolic feedback in GS homologs is a fruitful avenue for future studies. With regard to the weaknesses, the di-decamer in the absence of glutamine only forms under high (higher than physiological) concentrations of enzyme. Our primary evidence for the mutant behavior was the lack of crosslinking (Figure 1E), with supplementary support from the negative stain. In the revised version we will soften the language to say “reduced” rather than “did not support” filament formation.

      Reviewer #2 (Public review):

      The authors set out to resolve the high-resolution structure of a glutamine synthetase (GS) decamer using cryo-EM, investigate glutamine binding at the decamer interface, and validate structural observations through biochemical assays of ATP hydrolysis linked to enzyme activity. Their work sits at the intersection of structural and functional biology, aiming to bridge atomic-level details with biological mechanisms - a goal with clear relevance to researchers studying enzyme catalysis and metabolic regulation.

      Strengths and weaknesses of methods and results:

      A key strength of the study lies in its use of cryo-EM, a technique well-suited for resolving large, dynamic macromolecular complexes like the GS decamer. The reported resolutions (down to 2.15 Å) initially suggest the potential for detailed structural insights, such as side-chain interactions and ligand density. However, several methodological limitations significantly undermine the reliability of the results:

      (1) Cryo-EM data processing: The absence of critical details about B-factor sharpening - a standard step to enhance map interpretability - is a major concern. For high-resolution maps (<3 Å), sharpening is typically applied to resolve side-chain features, yet the submitted maps (e.g., those in Figures 1D, 2D, and supplementary figures) appear unprocessed, with density quality inconsistent with the claimed resolutions. This makes it difficult to evaluate whether observed features (e.g., glutamine binding) are genuine or artifacts of unsharpened data.

      (2) Modeling and density consistency: The structural models, particularly for glutamine binding at the decamer interface, do not align with the reported resolution. The maps shown in Figure 2D and Supplementary Figure S7 lack sufficient density to confidently place glutamine or even surrounding residues, conflicting with claims of 2.15 Å resolution. Additionally, fitting a non-symmetric ligand (glutamine) into a symmetry-refined map requires justification, as symmetry constraints may distort ligand placement.

      (3) Biochemical assay controls: While the enzyme activity assays aim to link structure to function, they lack essential controls (e.g., blank reactions without GS or substrates, substrate omission tests) to confirm that ATP hydrolysis is GS-dependent. The use of TCEP, a reducing agent, is also not paired with experiments to rule out unintended effects on the PK/LDH system, further limiting confidence in activity measurements.

      Achievement of aims and support for conclusions:

      The study falls short of convincingly achieving its goals. The claimed high-resolution structural details (e.g., side-chain densities, ligand binding) are not supported by the provided maps, which lack sharpening and show inconsistencies in density quality. Similarly, the biochemical data do not robustly validate the structural claims due to missing controls. As a result, the evidence is insufficient to confirm glutamine binding at the decamer interface or the functional relevance of the observed structural features.

      Likely impact and utility:

      If these methodological gaps are addressed, the work could make a meaningful contribution to the field. A well-resolved GS decamer structure would advance understanding of enzyme assembly and ligand recognition, while validated biochemical assays would strengthen the link between structure and function. Improved data processing and clearer reporting of validation steps would also make the structural data more reliable for the community, providing a resource for future studies on GS or related enzymes.

      We disagree with the reviewer’s overall assessment.

      With regard to sharpening and resolution: we examined sharpened maps and in a revised version will present additional supplementary figures showing these maps side by side. We note that the resolutions reported are global and that the most interesting features are, of course, in the periphery and subject to conformational and compositional heterogeneity. We will include supplementary figures of core side chain densities that are more like what are expected by the reviewer in the revision. 

      With regard to modeling: the apo filament and turnover filament datasets were handled nearly identically. The additional density is therefore likely not artefactual to the symmetry operator - however, the lower resolution in this region noted by the reviewer is worthy of further exploration. The maps are public and we think this is the most plausible interpretation of the density, which we based primarily on the biochemical data and will include more speculation in the version.

      With regard to the biochemical controls: we point the reviewer to Figure S1, which shows that omission of ammonia or glutamate in the wild-type (tagless) system removes any coupling of the reactions. We will perform the additional controls to publication quality in the revised version along with the TCEP control. We note that the reducing agent is present across all experiments, ruling out an effect on any specific result. The inclusion of TCEP is also very standard in other published uses of the Coupled ATPase assay (e.g. PMID: 31778111 and PMID: 32483380 by our first author)

      Additional context:

      Cryo-EM has transformed structural biology by enabling high-resolution analysis of large complexes, but its success hinges on rigorous data processing and validation steps that are critical to ensuring reproducibility. The challenges highlighted here are not unique to this study; they reflect broader issues in the field where incomplete reporting of methods can obscure the reliability of results. By addressing these points, the authors would not only strengthen their current work but also set a positive example for transparent and rigorous structural biology research.

      All the data is public and the reviewer or anyone is free to reinterpret the maps and models - and we encourage that rather than just an interpretation of our static figures. In addition, we will upload the raw micrograph data for the apo filament and turnover filament datasets to EMPIAR prior to submitting the revision.

      Reviewer #3 (Public review):

      In this manuscript, the authors propose a product-dependent negative-feedback mechanism of human glutamine synthetase, whereby the product glutamine facilitates filament formation, leading to reduced catalytic specificity for ammonia. Using time-resolved cryo-EM, the authors demonstrate filament formation under product-rich conditions. Multiple high-quality structures, including decameric and di-decameric assemblies, were resolved under different biochemical states and combined with MD simulations, revealing that the conformational space of the active site loop is critical for the GS catalysis. The study also includes extensive steady-state kinetic assays, supporting the view that glutamine regulates GS assembly and its catalytic activity. Overall, this is a detailed and comprehensive study. However, I would advise that a few points be addressed and clarified.

      (1) In Figure 2D and Supplementary Figure 7, the extra density observed between the two decamers does not appear to have the defining features of a glutamine. A less defined density may be expected given the nature of the complex, but even though mutagenesis assays were performed to support this assignment, none of these results constitutes direct and conclusive evidence for glutamine binding at this site. I would thus suggest showing the density maps at multiple contour thresholds to allow readers to also better evaluate the various small molecules under turnover conditions that cannot be well fitted based on this density map, helping to provide a more balanced interpretation of the results.

      (2) On the same point regarding the density for the enzyme under turnover conditions, more details should be provided about the symmetry expansion and classification performed, and also show the approximate ratio of reconstructions that include this density. Did you try symmetry expansion followed by focused classification, especially on the interface region?

      (3) The interface between the two decamers of the model needs to be double-checked and reassigned, especially for the residues surrounding the fitted glutamine. For example, the side chain of the Lys residue shown in the attached figure is most likely modeled incorrectly.

      We thank the reviewer for the feedback. As noted above, we will include supplemental figures that show maps at multiple thresholds and sharpening schemes. We noted in the manuscript and above that our interpretation here is based on integrating biochemical evidence alongside the density and will make that even more clear in the revised manuscript. The filaments +/- the putative glutamine density were processed nearly identically, but we will attempt various schemes of focused classification/symmetry expansion in the revision as well. However, we point out that there is extensive averaging there that makes modeling a bit trickier than expected given the global resolution.

    1. Author response:

      Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only). 

      The reviewer is correct that in this version of the tool, we focused on SNPs with effect on gene expression, as the majority of the SNPs identified by GWASs are non-coding SNPs. In the future improvement, we should also include coding SNPs that change the amino acid sequence of genes. We will discuss this point more in the revised manuscript.

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead. 

      This is a simple scheme to incorporate eQTL data from multiple tissues, assuming that the tissue that gives the strongest association is most relevant, or mainly mediates the effect from the SNP to the phenotype. This is a reasonable approach given that the tissues of origin for most of the phenotypes are unknown. In the future improvement, we should incorporate eQTL data from the appropriate tissue(s) if that is known.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants. 

      This is due to one of the limitations of the current algorithm: no direction of association is predicted explicitly. It could be that increasing the expression of a gene reduced the risk of one disease but increase the risk of another. Currently we have to analyze the details of the SNPs to infer direction once overlapping genes are found. This needs improvement in the future.  

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role. 

      We plan to incorporate reviewer's suggestion in the revised manuscript.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional? 

      We will consider a different presentation in the revised manuscript.

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals? 

      We will add these information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      Missing tissues or sparse eQTL certainly can produce false negatives as the signals linking the two phenotypes are simply not captured in the data. It is less likely to produce false positives as long as the statistical test is well controlled.   

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this? 

      When there are multiple SNPs in LD with multiple genes nearby, it is generally difficult to map the causal SNP and the causal gene it affected, and thus there will be spurious gene-trait links. When we calculate the global similarity based on the gene-trait association profiles,  we tried to control this by simulating with random GWASs that have the same power as the real GWAS and preserve the LD structure, as the spurious links will also be present in the simulated data (but may appear in different loci) that are used to calibrate the statistical significance. 

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these. 

      We assign SNPs to genes based on their strongest eQTL association from the available data. Improvement can be made if the relevant tissues for a trait are known (see response to Reviewer 1 above).

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative. 

      We agree with the reviewer that the reported trait links are speculative, and they should be treated as hypotheses generated from the computational analyses. To truly validate some of these proposed relationships, deeper functional analyses and experimental tests are needed.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness. 

      We agree more control on LD and power-matched simulations will be important for testing the robustness of the predictions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      This review focuses on the whole extravasation journey of leukocyte and highlights involvement of extracellular matrix (ECM) in multiple phases of the process. ECM may exert their roles either as a collective structure or as individual components. In the revision, for those functions involving specific matrix components, we will emphasize the matrix components and incorporate this information to subheadings as suggested. The parts of macrophage phenotype (Section 10-11) are included for its pivotal roles on deciding the tissue fate following inflammation (ie. to resolve / to regenerate damages incurred or to sustain inflammation), which is an important aspect of this review. ECM could modify macrophage phenotypes either directly (section 10) or indirectly via modulations of tissue stiffness or other cell types like fibroblasts (section 9). However, as pointed out by other reviewers as well, we acknowledge that Section 11 does not integrate well enough to the rest of the review. We plan to reorganize this part and to emphasize its link to ECM during the revision for better integration. We will reformat Table 1 for easier comprehension. We will consider restructuring Figure 2, which outlines various events influencing tissue decision of resolution/inflammation, perhaps by breaking up into two separate figures, to better focus the message. We will also check the language to improve readability.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      We will add a transition paragraph between Section 6 and Section 7 to provide a narrative that the extravasation processes affect downstream leukocyte functions. While lymphocytes follow a similar extravasation principle, their in-tissue activities differ from innate leukocytes. We will thus include discussion of lymphocyte-ECM crosstalk to Section 8 and/or 9 in the revision. We will restructure Section 11 and Figure 3 to better integrate to the rest of the review: In the current manuscript, we merely describe the capability of the MIKA framework to describe identity of any tissue macrophages and thus the framework could serve as a roadmap to facilitate identity normalization of pathological macrophages. We plan, in the revision, by employing the MIKA framework, to discuss and demonstrate linkage between macrophage identities and expression/production of modulators to functional ECM effectors described in Section 8-9. Regarding the comment of limited discussion of translational implications / therapeutic strategies, we will try to enrich this aspect throughout the manuscript where appropriate, in addition to the existing ones (eg. line 293-297; 388-391; 460-463; 512-517) We will also revise figure structure in general to avoid too dense information and to improve clarity. We will consider to provide a glossary explaining specialized terms to expand readership accessibility.

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We agree with and appreciate the specific and helpful suggestions by the reviewer. During the revision, we will provide the requested background description of MIKA to enhance accessibility of general readership. As pointed out by other reviewers, since this part (Section 11) is less well-integrated to the rest of the review, we will restructure this part by linking tissue macrophage identities under MIKA framework to modulation of functional ECM effectors described in previous sections (Section 8-9). We acknowledge the current figure organization might be overly information-dense and will consider breaking down the contents to multiple figures. The size and color-coding issues will also be addressed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

      Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM.

      Strengths:

      The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC lines including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript.

      Weaknesses:

      I have a few comments and suggestions for the authors. See below.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their  conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids

      recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the revised version of the manuscript we will use the single-cell RNA sequencing data and immunostainings to provide this information. Based on previous analyses from Birey et al (Cell Stem Cell, 2022), we expect interneurons within assembloids to express mostly calbindin (CALB2) and somatostatin (SST) at this in vitro stage of development; parvalbumin subtype appears later based on data from Birey et al (Nature, 2017) and more recently from Varela et al, (bioRxiv, 2025).

      In parallel, we will analyze available scRNA-seq data from developing human primary brain tissue a similar age as the one used in the manuscript, and check whether these subtypes of interneurons are similar to the ones within assembloids.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Figure S1). 

      We go agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer this important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Figure 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain astrocytes, we think these glia contribute to the observed pro-inflammatory changes. Based on these results and because ADM is known to have strong anti-inflammatory properties, the effects of ADM on hypoxic astrocytes should be investigated in future studies focused on hypoxia-induced inflammation. In the revision, we will address this comment in the discussion section and cite the appropriate papers.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the included experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision we will plot and include in the figures the data about the cell-type expression of ADM and its receptors in hCOs.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrom, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. We will revise the manuscript by incorporating a paragraph about this in the Discussion section.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hCOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we will add data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We will expand our discussion to include more details and the need to validate these findings using in vivo models, while also acknowledging that different species (e.g. rodents versus non-human primates versus humans) might have different responses to hypoxia.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we suggest these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other processes during cortical development. In the revised manuscript, we will include citations about the effects of hypoxia on interneuron proliferation, maturation and circuit integration as available, and also expand to other cell types known to be affected.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error and we will correct it in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature. 

      Thank you. Please refer to the new section (Lines 604-617)

      Reviewer #3 (Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      Thank you. Please refer to Lines 530-537.  

      “Of note, Hypoxia-Inducible Factor 1α (HIF1 α) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1 α and aging has been previously reported.[32] Furthermore, additional studies have investigated HIF1 in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.”

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.  

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study.  

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, we would still keep the results in the current study.

    1. Author response:

      Reviewer #1 (Public review):

      (1) The strength of the relationship between the different transcriptional parameters and the mean expression output is displayed visually in Figures 5 and 7, but is not formally quantified. Given that the tau_off times seem more correlated to mean activity for some enhancers (e.g., rho) than others (e.g., sna SE), the quantification might be useful.

      We re-plot Figure 5 and Figure 7 to present the correlation between the studied burst parameters. As the reviewer suggested, after quantifying the correlation we can better study the correlation between the cells averaged tau-off and the cell-averaged fluorescence signal in some of the selected enhancers. As a result of these findings we decide to change our message and instead of claiming that the burst statistics are homogeneous over the embryo domain, to claim that these statistics have weak but significant correlations with the cell-averaged mean gene fluorescence.  

      (2) There are some mechanistic details that are not discussed in depth. For example, the authors observe that the accumulation and degradation of the MS2 signal have similar slopes. However, given that the accumulation represents the transcription of MS2 loops, while the degradation represents diffusion of nascent transcripts away from the site of transcription, there is no mechanistic expectation for this. The degradation of signal seems likely to be a property of the mRNA itself, which shouldn't vary between cells or enhancer reporters, but the accumulation rate may be cell- or enhancer-specific. Similarly, the activity time depends both on the time of transcription onset and the time of transcription cessation. These two processes may be controlled by different transcription factor properties or levels and may be interesting to disentangle.

      The accumulation slope represents the rate of nascent transcript production, which depends on transcription initiation frequency and RNA polymerase elongation rate. While transcription initiation rates can vary between enhancers, our results show that the loading rates are relatively comparable across different enhancer sequences (Figure 5D). Instead, the primary difference observed was in activity time and burst frequency, consistent with previous findings that enhancers predominantly modulate burst frequency (Fukaya et al., 2016). The degradation slope represents the diffusion of completed transcripts away from the transcription site, which should be an intrinsic property of the mRNA molecule and therefore independent of the regulatory sequences driving transcription.

      (3) There are previous analyses of the eve stripe dynamics, which the authors cite, but do not compare the results of their work to the previous work in depth.

      The goal of this manuscript is to compare transcriptional bursting properties across different enhancers, rather than to provide an in-depth analysis of eve stripe dynamics specifically. We analyzed four transgenic constructs with different enhancers alongside an endogenous eve construct, focusing on comparative bursting parameters rather than detailed eve expression patterns. Additionally, the previously published eve stripe dynamics data came from BAC constructs, whereas our data comes from the endogenous eve locus. This methodological difference makes direct comparison of stripe dynamics less straightforward and less relevant to our central research question about enhancer-driven bursting variability.

      Reviewer #2 (Public review):

      (1) The manuscript does not clearly delineate how this analysis extends beyond the prior landmark study (citation #40: Fukaya et al., 2016). While the current manuscript offers new modeling and statistics, more explicit clarification of what is novel in terms of biological conclusions and methodological advancement would help position the work.

      The prior study (Fukaya et al., 2016) characterized transcriptional bursting qualitatively, focusing on average burst properties per nucleus without systematic mathematical modeling or statistical analysis of burst-to-burst variability. While they demonstrated that enhancer strength correlates with burst frequency, no quantitative framework was developed to dissect the molecular mechanisms underlying these differences or to connect burst dynamics to spatial gene expression patterns.

      (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

      (2) While the methods are explained in detail in the Supplementary Information, the manuscript would benefit from including a diagrammatic model and explicitly clarifying whether the model is descriptive or predictive in scope.

      We plan to prepare the diagrammatic model in the formal response. 

      (3) The interpretation that fluorescence decay reflects RNA degradation could be confounded by polymerase runoff or transcript diffusion from the transcription site. These potential limitations are not thoroughly discussed. (Write few lines in the discussion)

      This concern, related to the interpretation of the predictive model will be addressed in a future work. The decay in the fluorescence signal can be biologically related to the transcription termination, polymerase detachment, and diffusion. A key limitation of the approach is that the model is phenomenological and does not these capture processes that can be addressed with a more mechanistic model.

      (4) The so-called loading rate is used as an empirical parameter in fitting fluorescence traces, but is not convincingly linked to distinct biological processes. The manuscript would benefit from a more precise definition or reframing of this term.

      We modify the language of our definition of loading rate as follows: Loading rate is defined as the rate of increase of fluorescence signal following promoter activation. This quantity is a proxy measurement for the rate of RNA Polymerase II transcription initiation.” The full transcription process has multiple mechanisms including chromatin dynamics, 3D enhancer-promoter interactions, transcription factor binding, mRNA polymerase pausing, and interactions between developmental promoter motifs and associated proteins. We did not have access to specific measurements of these mechanisms and therefore cannot provide a solid biological meaning of the model behind the inference algorithm. However, the fact that we have reproducible results in biological replicas can support the robustness of our method at predicting the promoter state in the studied datasets. In the formal response we will compare the performance of our method with other available ones.

      Reviewer #3 (Public review):

      (1)The algorithm is not benchmarked against previously used algorithms in the field to infer ON and OFF times, for example, those based on Hidden Markov models. A comparison would help strengthen the support for this algorithm (if it really works well) or show at which point one must be careful when interpreting this data.

      We are implementing a benchmarking protocol to compare our results with the proposed and already published models. We expect to present this comparison in the formal response.

      (2) More broadly, the novelty of the findings and how those fit within the knowledge of the field is not super clear. A better account of previous findings that have already quantified ON, OFF times and so on, and how the current findings fit within those, would help better appreciate the significance of the work.

      To have a better clarity of the new findings we modified the title from “Regulation of Transcriptional Bursting and Spatial Patterning in Early Drosophila Embryo Development” to “Temporal Duration of Gene Activity is the main Regulator of Spatial Expression Patterns in Early Drosophila Embryos”.

      In short, (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):  

      Summary:  

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:  

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.  

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:  

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?  

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".  

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about interindividual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of interindividual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.  

      Comments on revisions:  

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.  

      (1) GLM Analysis Explanation (Figure 9)  

      While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other nonstatistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.

      The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.

      The relationship between the GLM findings and their original correlationbased conclusions needs better integration and connection, leading the reader through your reasoning.

      We thank the reviewer for highlighting this important point. We have revised the Results section in the reviseed manuscript to include a more detailed explanation of the GLM analysis. Specifically, we now clarify the interpretation of the model coefficients, including the direction and statistical significance, in relation to the hypothesized effects. We also outline the criteria we used to assess how well the GLM supports our original correlation-based conclusions—namely, whether the sign and significance of the coefficients align with the expected relationships derived from our prior analysis. Finally, we explicitly describe how the GLM results confirm or extend the patterns observed in the correlation-based analysis, to guide readers through our reasoning and the integration of both approaches.

      (2) Documentation of Changes  

      One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      We thank the reviewer for bringing this to our attention. We were equally confused to learn that the tracked-changes version was not visible, despite having submitted one to eLife as part of our revision. 

      Upon contacting the editorial office, they confirmed that we did submit a trackedchanges version, but clarified that it did not contain embedded figures (as they were added manually to the clean version).  The editorial response said in detail: “Regarding the tracked-changes file: it appears the version with markup lacked figures, while the figure-complete PDF had markup removed, which likely caused the confusion mentioned by the reviewers.” We hope this answer from eLife clarifies the reviewers’ concern.

      (2)  Statistical Method Selection  

      The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      Why ridge regression was selected as the optimal method  

      How the regularization parameter (λ) was determined  

      How this choice affects the interpretation of environmental parameters' influence on individuality

      We appreciate the reviewer’s thoughtful question regarding our choice of statistical method. In response, we have expanded the Methods section in the revised manuscript to provide a more detailed justification for the use of a GLM, including ridge regression. Specifically, we explain that ridge regression was selected to address collinearity and to control for overfitting.

      We now also describe how the regularization parameter (λ) was selected: we used 5-fold cross-validation over a log-spaced grid (10<sup>⁻⁶</sup> - 10<sup>⁶</sup) to identify the optimal value that minimized the mean squared error (MSE).

      Finally, we clarify in both the Methods and Results sections how this modeling choice affects the interpretation of our findings. 

      Reviewer #2 (Public review):  

      Summary:  

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:  

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:  

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.  

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Reviewer #3 (Public review):  

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days.  

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).

      (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.  

      Comments on revisions:'  

      The authors have addressed my previous concerns.  

      We thank the reviewer for the positive feedback and are glad our revisions have satisfactorily addressed the previous concerns. We appreciate the thoughtful input that helped us improve the clarity and rigor of the manuscript.

      Reviewer #1 (Recommendations for the authors):  

      Comment on Revised Manuscript  

      Recommendations for Improvement  

      (1) Expand the Results section for Figure 9 with a more detailed interpretation of the GLM coefficients and their biological significance

      (2) Provide explicit criteria (or at least explain in detail) for how the GLM results confirm or undermine their original hypothesis about environmental context hierarchy

      While the claims are interesting, the additional statistical analysis appears promising. However, clearer explanation of these new results would strengthen the paper and ensure that readers from diverse backgrounds can fully understand how the evidence supports the authors' conclusions about individuality across environmental contexts. 

      We thank the reviewer for these constructive suggestions. In response to these suggestions, we have expanded both the Methods and Results sections to provide a more detailed explanation of the GLM coefficients, including their interpretation and how they relate to our original correlation-based findings.

      We now clarify how the direction, magnitude, and statistical significance of specific coefficients reflect the influence of different environmental factors on the persistence of individual behavioral traits. To make this accessible to readers from diverse backgrounds, we explicitly outline the criteria we used to evaluate whether the GLM results support our hypothesis about the hierarchical influence of environmental context, namely, whether the structure and strength of effects align with the patterns predicted from our prior correlation analysis.

      These additions improve clarity and help readers understand how the new statistical results reinforce our conclusions about the context-dependence of behavioral individuality.

      Reviewer #2 (Recommendations for the authors):  

      Thanks for the revision of the paper! I updated my review to try and provide a little more guidance by what I mean about updating your analyses. I really think this is a super cool data set and I genuinely wish this were MY dataset so that way I could really dig into it to partition the variance. These variance partitioning methods are standard in my particular subfield (study of individual behavioral variation in ecology and evolution) and so I think employing them is 1) going to offer a MUCH more elegant and holistic view of the behavioral variation (e.g. you can report a single repeatability estimate for each behavior rather than 3 different correlations) and 2) improve the impact and readership for your paper as now you'll be using methods that a whole community of researchers are very familiar with. It's just a suggestion, but I hope you consider it!

      We sincerely thank the reviewer for the insightful and encouraging feedback and for introducing us to this modeling approach. In response to this suggestion, we have incorporated a hierarchical linear mixed-effects model into our analysis (now presented in Figure 10), accompanied by a new supplementary table (Table T3). We also updated the Methods, Results, and Discussion sections to describe the rationale, implementation, and implications of the mixed-model analysis.

      We agree with the reviewer that this approach provides a more elegant way to quantify behavioral variation and individual consistency across contexts. In particular, the ability to estimate repeatability directly aligns well with the core questions of our study. It facilitates improved communication of our findings to ecology, evolution, and behavior researchers. We greatly appreciate the suggestion; it has significantly strengthened both the analytical framework and the interpretability of the manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mahajan et. al. introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/HiC data with the present analysis will be more informative.

      (1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

      Thank you for this insightful question. In the revision we will add chromatin state annotations using ChromHMM to identify overlap between HAT/LAT and corresponding chromatin state. This should provide the specific percentages and distributions you requested.

      We would like to take this opportunity to clarify that based on the plots Fig S1, and differential gene expressions, HAT is most likely a subset of euchromatin and LAT may contain both euchromatin and heterochromatin. The HAT/LAT cutoff occurs around the knee point in the log-log plot (Figure S1), where the linear portion indicates scale-invariant behavior with similar relative changes across expression ranks. The non-linear portion represents departure from power-law scaling, where low-expression genes exhibit sharper decline than expected. This suggests potential biological mechanisms such as chromatin silencing, detection limits, or technical artifacts related to sequencing depth.

      We will provide detailed chromatin state analysis in the revision. For reference, HAT gene lists per chromosome are available in our GitHub repository at: https://github.com/altoslabs/papers-2025-rnaseq-chrom-aging/tree/main/data/Preprocessed_dat a under /<dataset>/chromosome_{}/data_hi.

      (2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

      We apologize for the confusion arising from the usage of the term “background noise”. We agree that the distinction between high-abundance transcripts (HATs) and low-abundance transcripts (LATs) deserves more explicit discussion in the Results.

      Our intention is to say that HAT has a higher signal-to-noise ratio (SNR) compared to LAT. This is coming from the power law graph of FigS1.  Our intention is to state that the HAT class provides a strong, robust signal, consistent across chromosomes and the LAT class exhibits lower SNR and a more uniform background-like distribution in the context of the problem we are solving and not rather a generic biological statement. The experiment result that led to this statement is presented in FigS3. This does not imply that low-abundance transcripts lack biological relevance, but rather that they contribute less to the spatial organization patterns we measure.

      (3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

      We thank the reviewer for this incisive biological observation. While the differential effectiveness of ASO-treated LINE-1 in HGPS versus WRN cell lines is indeed an interesting phenomenon that may relate to basal L1 activity differences, this biological mechanism falls outside the scope of our current study.

      Our paper focuses on demonstrating that the 𝓁∗ metric can sensitively detect chromatin structural changes that have been independently validated. We utilize the Della Valle et al. (2022) dataset specifically because it provides experimentally confirmed chromatin structural differences (Progeroid vs wild-type vs ASO-treated Progeriod), allowing us to validate that 𝓁∗ correlates with these established changes.

      For detailed discussion of the biological mechanisms underlying differential LINE-1 ASO effectiveness between progeroid syndromes, we would direct readers to Della Valle et al. (2022) and related LINE-1 biology literature. Our contribution lies in demonstrating that 𝓁∗ can capture these chromatin organizational changes with enhanced sensitivity compared to traditional expression-based approaches. We are reluctant, without further experimentation, to venture into over-interpreting these results from a biology perspective.  

      (4) The authors report that "from the results on Fleischer dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

      Thank you for this important clarification request. This observation, illustrated in Figure 3, highlights two key points: (1) 𝓁∗ shows similar trends to PCA analysis, and (2) 𝓁∗ demonstrates higher sensitivity than traditional gene expression analysis.

      This enhanced sensitivity enables better discrimination between aging states, particularly in the Fleischer dataset representing natural aging where changes are more gradual. The higher sensitivity stems from 𝓁∗'s ability to capture transcriptional spatial organization through spatial autocorrelation, which can detect subtle organizational changes that may precede or accompany expression changes rather than replacing them.

      We will clarify in the revision that chromatin organizational changes and gene expression changes are complementary rather than mutually exclusive phenomena during aging.

      (5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

      Thank you for the feedback. If we understand your comment correctly, we need to provide a chromosome-wise distribution for Fig3c. We will update the paper and the supplementary.

      (6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

      Thank you for the helpful suggestion. In the revision, we will incorporate permutation-based significance testing by shuffling the gene annotation and count table to generate a null distribution for our 𝓁∗ calculation. This will allow us to more rigorously assess whether the observed differences across age groups or treatment states deviate from chance expectations and thereby lend greater statistical confidence to our findings.

      (7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes.

      What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

      Regarding chromosome 1 focus: we initially presented chromosome 1 as a representative example, but we will include energy landscape analysis for all chromosomes in the supplementary materials

      We use the same HATs that were extracted during 𝓁∗ for the energy landscape as well. The HAT details are present in the github repo, the link provided in response to 1st feedback.

      The normalization of the energy barrier ensures comparability across chromosomes of different sizes and across samples with different absolute expression scales. Specifically, we normalize with respect to the total area under the two-dimensional energy landscape while using the thermal energy (k_B T) as a scaling factor to place transition energy barriers on the scale of thermal fluctuations. This is formally expressed as in Eq. (1). 

      The physical consequences of symmetry in the energy landscape are discussed in lines 472-491 of the manuscript, where we also introduce the concept of irreversibility. In brief, the chromatin energy landscape (Figure 8) is constructed by quantifying the energy contributions of genes that are upregulated (lower triangular matrix) and downregulated (upper triangular matrix) between two states. If the integrated energy contributions of upregulated and downregulated genes are equal, the landscape is symmetric, representing a thermodynamically reversible process, for example, nucleosome repositioning between euchromatic and heterochromatic regions without net gain or loss of nucleosomes. However, in cases where epigenetic modifications alter nucleosome density (e.g., disease states that reduce nucleosome numbers), the integrated energies are unequal, reflecting an irreversible energy cost. In this case, restoring chromatin requires additional energy input (e.g., to replace “missing” nucleosomes), which manifests as asymmetry in the landscape.

      Reviewer #2 (Public review):

      The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

      (1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

      We appreciate the reviewer’s point that direct measurements such as ATAC-seq and ChIP-seq remain the gold standard for characterizing chromatin structure. Our method is designed to complement, not replace, these approaches by leveraging RNA-seq data to detect large-scale transcriptional patterns that correlate with chromatin dynamics.

      We agree that integrating datasets with paired RNA-seq and chromatin accessibility assays would strengthen the manuscript and plan to include one such dataset in the revision.

      Based on this feedback, we will also take the opportunity during revision to clarify and soften certain statements. Specifically, we will reposition ℓ∗ as a sensitive, computational proxy for detecting transcriptional signatures that are suggestive of chromatin structural changes. In other words, ℓ∗ provides an indirect window into chromatin dynamics through transcriptional spatial organization, allowing detection of patterns that may precede or accompany structural changes. Direct assays such as ATAC-seq or ChIP-seq remain essential for confirming the underlying physical modifications. To make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text.  

      We would like to take this opportunity to clarify why our initial version focused on the Della Valle and Fleischer datasets rather than including new paired datasets with direct chromatin measurements. The primary objective of our paper is to introduce two macroscopic RNA-seq–based measures, ℓ∗ and the energy landscape, that are designed to detect transcriptional signatures suggestive of chromatin structural changes in the context of aging and age-related diseases. These measures explicitly model transcriptional spatial organization and provide a sensitive, scalable way to analyze RNA-seq data in domains where direct chromatin assays may not be readily available.

      The datasets we used (Della Valle et al., Fleischer et al.) have been rigorously validated and independently demonstrated differences in chromatin structure between conditions. Our goal was to show that ℓ∗ and the energy landscape align with and extend these established findings, offering a more sensitive measure of transcriptional spatial organization. Specifically, in the Della Valle dataset, chromatin structural differences between progeroid and healthy donors — and their partial rescue by LINE-1 ASO treatment — were experimentally confirmed, providing a strong foundation for testing whether our metrics reflect these known changes. Similarly, the Fleischer dataset captures natural, in vivo aging, which has also been linked to chromatin alterations in prior studies.

      Thus, our approach builds on this well-established biological context rather than attempting to re-demonstrate these chromatin differences from scratch. Finally, we emphasize that our current focus is aging and age-related diseases. While the framework could potentially be applied to other chromatin modification contexts, we have not tested it outside this domain and do not claim general applicability at this stage.

      (2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

      We request the reviewer to refer to our response to (1) as it includes the rationale behind the selection of LINE-1 and Fleischer dataset. We would also like to state that while the focus of Della Valle et al. was LINE-1 treated ASO to show rescue of progeroid samples, it also contains data for non-treated as well as healthy samples. Importantly, untreated progeroid samples show distinctly different chromatin structure compared to healthy samples, with substantial differences detectable by both PCA and our 𝓁∗ metric.

      Our 𝓁∗ method provides additional interpretability by capturing transcriptional spatial organization, resulting in shorter correlation lengths for healthy patients and longer lengths for progeroid patients.

      But as mentioned in our response to (1) we will try to add an additional dataset with paired rna-seq and one of ATAC, ChIP-seq or CUT n RUN in the revision

      (3) There is no statistical analysis, e.g., in Figures 4 and 5.

      We have provided statistical analysis for Fig 4 (lines 237-241). We will do a similar analysis for Fig. 5. 

      (4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

      The data for individual chromosomes is available in supplementary Fig. S11 – references at line 425. We will make this cross-reference clearer in the main text and consider whether some of this chromosome-specific information should be elevated to the main figures for better accessibility.

      (5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

      Figure S6 demonstrates that 𝓁∗ analysis identifies chromosome 6 as most affected, consistent with differential gene expression patterns.

      Regarding donor differences in WT vs HGPS comparisons, we defer to the experimental design of Della Valle et al., which follows standard practices in progeroid research. Our review of the literature indicates that progeroid studies typically use either parent/child samples or different donor comparisons (as individuals cannot simultaneously represent both WT and HGPS states).

      Importantly, the LINE-1 ASO treatment comparisons use the same cell lines, eliminating donor variability concerns. This experimental design allows us to validate that 𝓁∗ can detect rescue effects within genetically identical samples, supporting the method's sensitivity to chromatin structural changes  

      Reviewing Editor Comments:

      You'll note that both reviewers were very thoughtful in their comments, and in principle are supportive and excited by the work. However, their evaluation of the strength of evidence diverged substantially. I'm inclined to suggest that finding a way to support the novel method with an alternative approach would greatly improve the impact of this work. I encourage you to consider a revision that provides such data, in the context of technology currently available to the field.

      We sincerely thank the editor for their thoughtful and encouraging assessment of our work. We are grateful for their recognition of the novelty of our macroscopic measures (ℓ∗ and the transition energy barrier) and their potential to provide a systems-level understanding of chromatin structural dynamics in aging and age-related syndromes. In response to the editor’s suggestion for direct validation with chromatin accessibility data, we plan to integrate an additional dataset containing paired RNA-seq and ATAC-seq or related measurements in our revision. This will help strengthen the link between our RNA-seq–based metrics and direct chromatin assays. We have also clarified and softened the manuscript text to ensure it is clear that ℓ∗ serves as a complementary, computational proxy, not a replacement, for direct experimental approaches. Very specifically, to make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text. We thank the editor for the feedback. We have provided additional details in response to specific comments made by the reviewers.

    1. Author response:

      We thank the reviewers for their valuable feedback. We will prepare a revision of the manuscript based on these suggestions and comments. We are sure these revisions will improve the paper.

      The only major point we wish to clarify is that this is the first and only manuscript describing the toolbox; it is not a version update. Although it shares a similar name with its 2015 MATLAB predecessor (Nili et al., PLoS Comput Biol), rsatoolbox was designed from scratch. Also, they have no code or structural overlap beyond implementing some similar methods.

      Developed publicly since 2019, rsatoolbox reflects a decade of research in RSA methodology across multiple labs and incorporates new dissimilarity metrics, RDM comparators, inferential procedures, and visualization methods. Importantly, although we cite several papers describing methods implemented in the toolbox, this is the first manuscript to present the toolbox as a whole, its design principles, and the unified analytical framework it offers.

      We are sorry about the forgotten placeholder and the links not working. The links work for us in the pdf at least and we will certainly fix the placeholder as soon as possible.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understanding the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into a solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      We are grateful for the referee’s assessment of our work and insightful suggestions, which we address point by point below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained models, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Figure 11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      We agree with the referee that the wording used in the original version was inaccurate. We did not want to expand too much on the previous results on Lys/Arg, to avoid overwhelming our readers with background information that was not directly relevant to the aromatic residues Phe and Tyr. We have now introduced some of the missing details in the hope that this will provide a more accurate account of what has been achieved with different versions of coarse-grained models. In the revised version, we say the following:

      Das and co-workers attempted to explain arginine’s greater propensity to phase separate in Ddx4 variants using coarse-grained simulations with two different energy functions (Das et al., 2020). The model was first parametrized using a hydrophobicity scale, aimed to capture the “stickiness” of different amino acids (Dignon et al., 2018), but this did not recapitulate the correct rank order in the stability of the simulated condensates (Das et al., 2020). By replacing the hydrophobicity scale with interaction energies from amino acid contact matrices —derived from a statistical analysis of the PDB (Dignon et al., 2018; Miyazawa and Jernigan, 1996; Kim and Hummer, 2008)— they recovered the correct trends (Das et al., 2020). A key to the greater propensity for LLPS in the case of Arg may derive from the pseudo-aromaticity of this residue, which results in a greater stabilization relative to the more purely cationic character of Lys (Gobbi and Frenking, 1993; Wang et al., 2018; Hong et al., 2022).

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Figure 4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appear to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or the critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      We are grateful for this very insightful comment by the referee. We have followed this suggestion to address whether, despite similar interaction patterns in GSY and GSF condensates, their stabilities are different. As in our previous work (De Sancho, 2022), we have run replica exchange MD simulations for both condensates and derived their phase diagrams. Our results, shown in the new Figure 5 and supplementary Figs. S6-S7, clearly indicate that the GSY condensate has a lower saturation density than the GSF condensate. This result is consistent with the trends observed in experiments on mutants of the low-complexity domain of hnRNPA1, where the relative amounts of F and Y determine the saturation concentration (Bremer et al., 2022).

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)".  For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for a polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that the relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have a relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      We thank the referee for this useful comment. We are aware that the estimate we mentioned is not direct. We have now clarified this point and added the additional estimate from Das et al. In the new version of the manuscript, we say:

      Our 𝜀 values for the condensates (39 ± 5 for GSY and 47 ± 3 for GSF) are surprisingly close to that derived from experiments on Ddx condensates using Flory-Huggins theory (45±13) (Nott et al., 2015) and from atomistic simulations of Ddx4 (∼35−50 at a volume fraction of 𝜙 = 0.4) (Das et al., 2020).

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε=2-4, or at most 15), condensates formed by highly charged polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      We thank the referee for their comment regarding highly charged polymers. However, we have chosen not to address these systems in our manuscript, as they are significantly different from the GSY/GSF peptide condensates under investigation. In polyelectrolyte systems, condensate formation is primarily driven by electrostatic interactions and counterion release, while we highlight the role of transfer free energies. At high dielectric constants (and dielectrics even higher than that of water), the strength of electrostatic interactions will be greatly reduced. In our approach to estimate differences between Y and F, the transfer free energy should plateau at a value of ΔΔG=0 in water. At greater values of ε>80, it becomes difficult to predict whether additional effects might become relevant. As this lies beyond the scope of our current study, we prefer not to speculate further.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (page 4 and thus the simulation box has a rectangular geometry. Did the authors apply Equation 2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Figure 3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Equation 16 in the above-cited Wessén et al. 2021 paper). A brief added comment will be useful.

      We have calculated the relative permittivity from dense phases only. These dense phases were sliced from the slab geometry and then re-equilibrated. Long simulations were then run to converge the calculation of the dielectric constant. We have clarified this in the Methods section of the paper. We say:

      For the calculation of the dielectric constant of condensates, we used the simulations of isolated dense phases mentioned above.

      (6) Concerning the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Figure 3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      We thank our referee for this insightful comment. While we restrict our analysis to aromatic pairs in this work, the observed crossover will certainly affect other pairs where tyrosine or phenylalanine are involved. We now comment on this point in the discussions section of the revised manuscript. This topic will be explored in detail in a follow-up manuscript we are currently completing. We say:

      We note that, although we have not included in our analysis positively charged residues that form cation-π interactions with aromatics, the observed crossover will also be relevant to Arg/Lys contacts with Phe and Tyr. Following the rationale of our findings, within condensates, cation-Tyr interactions are expected to promote phase separation more strongly than cation-Phe pairs.

      (7) Page 9: The observation of weaker effective F-F (and a few other nonpolar-nonpolar) interactions in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resources in protein science.

      We thank the referee for bringing this body of work to our attention. In the revised version of our work, we briefly mention how it relates to our results. We also note that the suggested references have pointed to another of the limitations of our study, that of chain connectivity, addressed in the work by Shimizu and Chan. While we were well aware of these limitations, we had not mentioned them in our manuscript. Concerning the distinction between pair and bulk hydrophobicities, we include the following in the concluding lines of our work:

      The observed context dependence has deep roots in the concepts of “pair” and “bulk” hydrophobicity (Wood and Thompson, 1990; Shimizu and Chan, 2002). While pair hydrophobicity is connected to dimerisation equilibria (i.e. the second step in Figure 2B), bulk hydrophobicity is related to transfer processes (the first step). Our work stresses the importance of considering both the pair contribution that dominates at high solvation, and the transfer free energy contribution, which overwhelms the interaction strength at low dielectrics.

      Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      We thank the referee for his/her positive assessment of our work. Below, we address all the questions raised one by one.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

      Weaknesses:

      While the study is quite comprehensive and the paper well written, there are a few instances that would benefit from additional details. In the methods section, it is unclear as to whether the GGXGG peptides upon which the alchemical transforms are conducted are positioned restrained within the condensed/dilute phase or not. If they are not, how would the position of the peptides within the condensate alter the calculated free energies reported? 

      The peptides are not restrained in our simulations and can therefore diffuse out of the condensate given sufficient time. Although the GGXGG peptide can, given sufficient time, leave the peptide condensate, we did not observe any escape event in the trajectories we used to generate starting points for switching. Hence, the peptide environment captured in our calculations reflects, on average, the protein-protein and protein-solvent interactions inside the model condensate. We believe this is the right way of performing the calculation of transfer free energy differences into the condensate. We have clarified this point when we describe the equilibrium simulation results in the revised manuscript. We say:

      Also, the peptide that experiences the transformation, which is not restrained, must remain buried within the condensate for all the snapshots that we use as initial frames, to avoid averaging the work in the dilute and dense phases.

      On the referee’s second point of whether there would be differences if the peptide visited the dilute phase, the answer is that, indeed, we would. We expect that the behaviour of the peptide would approach ΔΔG=0, considering the low protein concentration in the dilute phase. For mixed trajectories with sampling in both dilute and dense phases, our expectation would be a bimodal distribution in the free energy estimates from switching (see e.g. Fig. 8 in DOI:10.1021/acs.jpcb.0c10263). Because we are exclusively interested in the transfer free energies into the condensate, we do not pursue such calculations in this work.

      It would also be interesting to see what the variation in the transfer of free energy is across multiple independent replicates of the transform to assess the convergence of the simulations. 

      Upon submission of our manuscript, we were confident that the results we had obtained would pass the test of statistical significance. We had, after all, done many more simulations than those reported, plus the comparable values of ΔΔG<sub>Transfer</sub> for both GSY and GSF pointed in the right direction. However, we acknowledge that the more thorough test of running replicates recommended by the referee is important, considering the slow diffusion within the Tyr peptide condensates due to its stickiness. Also, the non-equilibrium switching method had not been tested before for dense phases like the ones considered here.

      We have hence followed our referee's suggestion and done three different replicates, 1 μs each, of the equilibrium runs starting from independent slab configurations, for both the GSY and GSF condensates (see the new supporting figures Fig. S1, S2 and S5). We now report the errors from the three replicates as the standard error of the mean (bootstrapping errors remain for the rest of the solvents). Our results are entirely consistent with the values reported originally, confirming the validity of our estimates.

      Additionally, since the authors use a slab for the calculation of these free energies, are the transfer free energies from the dilute phase to the interface significantly different from those calculated from the dilute phase to the interior of the condensate? 

      We thank the referee for this valuable comment, as it has pointed us in the direction of a rapidly increasing body of work on condensate interfaces, for example, as mediators of aggregation, that we may consider for future study with the same methodology. However, as discussed above, we have not considered this possibility in our work, as we decided to focus on the condensate environment, rather than its interface.

      The authors mention that the contact statistics of Phe and Tyr do not show significant difference and thereby conclude that the more favorable transfer of Tyr primarily originates from the dielectric of the condensate. However, the calculation of contacts neglects the differences in the strength of interactions involving Phe vs. Tyr. Though the authors consider the calculation of energy contact formation later in the manuscript, the scope of these interactions are quite limited (Phe-Phe, Tyr-Tyr, Tyr-Amide, Phe-Amide) which is not sufficient to make a universal conclusion regarding the underlying driving forces. A more appropriate statement would be that in the context of the minimal peptide investigated the driving force seems to be the difference in dielectric. However, it is worth mentioning that the authors do a good job of mentioning some of these caveats in the discussion section.

      We thank the referee for this important comment. Indeed, the similar contact statistics and interaction patterns that we reported originally do not necessarily imply identical interaction energies. In other words, similar statistics and patterns can still result in different stabilities for the Phe and Tyr condensates if the energetics are different. Hence, we cannot conclude that the GSF and GSY condensate environments are equivalent.

      To address this point, we have run new simulations for the revised version of our paper, using the temperature-replica exchange method, as before. From the new datasets, we derive the phase diagrams for both the GSF and GSY condensates (see the new Fig. 5). We find that the tyrosine-containing condensate is more stable than that of phenylalanine, as can be inferred from the lower saturation density in the low-density branch of the phase diagram. In consequence, despite the similar contact statistics, the energetics differ, making the saturation density of the GSY slightly lower than that of GSF. This result is consistent with experimental data by Bremer et al (Nat. Chem. 2022). 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors address the paradox of how tyrosine can act as a stronger sticker for phase separation than phenylalanine, despite phenylalanine being higher on the hydrophobicity scale and exhibiting more prominent pairwise contact statistics in folded protein structures compared to tyrosine.

      We are grateful for the referee’s favourable opinion on the paper. Below, we address all of the issues raised.

      Strengths:

      This is a fascinating problem for the protein science community with special relevance for the biophysical condensate community. Using atomistic simulations of simple model peptides and condensates as well as quantum calculations, the authors provide an explanation that relies on the dielectric constant of the medium and the hydration level that either tyrosine or phenylalanine can achieve in highly hydrophobic vs. hydrophilic media. The authors find that as the dielectric constant decreases, phenylalanine becomes a stronger sticker than tyrosine. The conclusions of the paper seem to be solid, it is well-written and it also recognises the limitations of the study. Overall, the paper represents an important contribution to the field.

      Weaknesses:

      How can the authors ensure that a condensate of GSY or GSF peptides is a representative environment of a protein condensate? First, the composition in terms of amino acids is highly limited, second the effect of peptide/protein length compared to real protein sequences is also an issue, and third, the water concentration within these condensates is really low as compared to real experimental condensates. Hence, how can we rely on the extracted conclusions from these condensates to be representative for real protein sequences with a much more complex composition and structural behaviour?

      We agree with the main weakness identified by the referee. In fact, all these limitations had already been stated in our original submission. Our ternary peptide condensates are just a minimal model system that bears reasonable analogies with condensates, but definitely is not identical to true LCR condensates. The analogies between peptide and protein condensates are, however, worth restating: 

      (1) The limited composition of the peptide condensates is inspired by LCR sequences (see Fig. 4 in Martin & Mittag, 2018).

      (2) The equilibrium phase diagram, showing a UCST, is consistent with that of LCRs from Ddx4 or hnRNPA1.

      (3) The dynamical behaviour is intermediate between liquid and solid (De Sancho, 2022). 

      (4) The contact patterns are comparable to those observed for FUS and LAF1 (Zheng et al, 2020).

      The third issue pointed out by the referee requires particular attention. Indeed, the water content in the model condensates is low (~200 mg/mL for GSY) relative to the experiment (e.g. ~600 mg/mL for FUS and LAF-1 from simulations). Considering that both interaction patterns and solvation contribute to the favorability of Tyr relative to Phe, we speculate that a greater degree of solvation in the true protein condensates will further reinforce the trends we observe.

      In any case, in the revised version of the manuscript, we have made an effort to insist on the limitations of our results, some of which we plan to address in future work.

      Reviewer #3 (Recommendations for the authors):

      (1) The fact that protein density is so high within GSY or GSF peptide condensates may significantly alter the conclusions of the paper. Can the authors show that for condensates in which the protein density is ~0.2-0.3 g/cm3, the same conclusions hold? Could the authors use a different peptide sequence that establishes a more realistic protein concentration/density inside the condensate?

      Unfortunately, recent work with a variety of peptide sequences suggests that finding peptides in the density range proposed by the referee may be very challenging. For example, Pettit and his co-workers have extensively studied the behaviour of GGXGG peptides. In a recent work, using the CHARMM36m force field and TIP3P water, they report densities of ~1.2-1.3 g/mL for capped pentapeptide condensates (Workman et al, Biophys. J. 2024; DOI: 10.1016/j.bpj.2024.05.009). Brown and Potoyan have recently run simulations of zwitterionic GXG tripeptides with the Amber99sb-ILDNQ force field and TIP3P water, starting with a homogenous distribution in cubic simulation boxes (Biophys. J. 2024, DOI: 10.1016/j.bpj.2023.12.027). In a box with an initial concentration of 0.25 g/mL, upon phase separation, the peptide ends up occupying what would seem to be ~1/3 of the box, although we could not find exact numbers. This would imply densities of ~0.75 g/mL in the dense phase, with the additional problem of many charges. Finally, Joseph and her co-workers have recently simulated a set of hexapeptide condensates with varied compositions using a combination of atomistic and coarse-grained simulations. For the atomistic simulations, the Amber03ws force field and TIP4P water were used (see BioRxiv reference 10.1101/2025.03.04.641530). They have found values of the protein density in the dense phase ranging between 0.8 and 1.2 g/mL.  The consistency in the range of densities reported in these studies suggests that short peptides, at least up to 7-residues long, tend to form quite dense condensates, akin to those investigated in our work. While the examples mentioned do not comprehensively span the full range of peptide lengths, sequences, and force fields, they nonetheless support the general behaviour we observe. A systematic exploration of all these variables would require an extensive search in parameter space, which we believe falls outside the scope of the present study.

      (2) Do the conclusions hold for phase-separating systems that mostly rely on electrostatic interactions to undergo LLPS, like protein-RNA complex coacervates? In other words, could the authors try the same calculations for a binary mixture composed of polyR-polyE, or polyK-polyE?

      This is an excellent idea that we may attempt in future work, but the remit of the current work is aromatic amino acids Phe and Tyr only. Hence, we do not include calculations or discussion on polyR-polyE systems in our revised manuscript.

      (3) One of the major approximations made by the authors is the length of the peptides within the condensates, which is not realistic, or their density. Specifically, could they double or triple the length of these peptides while maintaining their composition so it can be quantified the impact of sequence length in the transfer of free energies?

      We thank the referee for this comment and agree with the main point, which was stated as a limitation in our original submission. The suggested calculations anticipate research that we are planning but will not include in the current work. One of the advantages of our model systems is that the small size of the peptides allows for small simulation boxes and relatively rapid sampling. Longer peptide sequences would require conformational sampling beyond our current capabilities, if done systematically. An example of these limitations is the amount of data that we had to discard from the new simulations we report, which amounts to up to 200 ns of our replica exchange runs in smaller simulation boxes (i.e. >19 μs in total for the 48 replicas of the two condensates!). As stated in the answer to point 1, we have found in the literature work on peptides in the range of 1-7 residues with consistent densities. Additionally, a recent report using alchemical transformations using equilibrium techniques with tetrapeptide condensates, pointing to the role of transfer free energy as driving force for condensate formation, further supports the observations from our work.

      Minor issues:

      (1) The caption of Figure 3B is not clear. It can only be understood what is depicted there once you read the main text a couple of times. I encourage the authors to clarify the caption.

      We have rewritten the caption for greater clarity. Now it reads as follows:

      Time evolution of the density profiles calculated across the longest dimension of the simulation box (L) in the coexistence simulations. In blue we show the density of all the peptides, and in dark red that of the F/Y residue in the GGXGG peptide.

      (2) Why was the RDF from Figure 5A cut at such a short distance? Can the authors expand the figure to clearly show that it has converged?

      In the updated Figure 5 (now Fig. 6), we have extended the g(r) up to r=1.75 nm so that it clearly plateaus at a value of 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you very much for your recognition of our work and for pointing out the shortcomings. We have made revisions one by one and provided corresponding explanations regarding the issues you raised.

      Weaknesses:

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      thank you for raising the question regarding the application of wPLI between the alpha and theta bands, which indeed deserves further explanation. In our study, we referred to some relevant previous literatures and adopted their approach of using wPLI to measure cross-frequency coupling strength, as this index itself can reflect the stability of phase differences. We have also considered the point you mentioned that the phase differences of oscillations with different frequencies are difficult to remain consistent. However, in this study, the presentation times of the two memory items are the same, which is fair to both from this perspective. Moreover, the study observed that the wPLI values of these two items alternately dominate over time, and this changing pattern is consistent with the regularity of behavioral data. It seems hard to explain this as a mere coincidence. 

      The corresponding discussion has been added to the revised part of the paper:“the present study referenced previous research by using the wPLI index as a measure of cross-frequency coupling strength31,64-66 (this index quantifies the stability of phase differences), yet the phases of different oscillations inherently change over time. However, this is fair to the two memory items in the present study, as their presentation times were balanced. The study found that the wPLI values of the two items alternately dominated over time, consistent with the pattern of behavioral data, which is hardly explicable by coincidence”

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57), indicating a non-significant relationship.

      you mentioned that the correlation between the attentional capture effect and the mean amplitude of alpha power in the electrophysiological data might be influenced by an outlier, and you also compared the results of Pearson and Spearman correlation coefficients, which we fully agree with.

      It is true that the small sample size of the current study makes the results vulnerable to interference from extreme data. Regarding this point, I have already explained it in the limitations section of the discussion in the revised manuscript:“the sample size of the current study is small, which may render the results vulnerable to interference from extreme cases”

      The behavioral data are interesting, but in my opinion, they closely replicate Peters and colleagues (2020) using a different paradigm. In that study, participants memorized four spatial positions that formed the endpoints of two objects, and one object was cued. Similarly, reaction times fluctuated at theta frequency, and there was an anti-phase relationship between the two objects. The main novelty of the present study is that this bias can be transferred to an unrelated task. While the current study extends Peters and colleagues' findings to a different task context, the lack of a thorough, direct comparison with Peters et al. limits the clarity of the novel insights provided.

      thank you very much for your attention to the behavioral data and its relevance to the study by Peters et al. (2020). We have noticed that there are similarities in some results between the two studies, which also indicates the stability of the relevant phenomena from one aspect.

      However, we would also like to further explain the differences between this study and the study by Peters et al. In the study by Peters et al., participants memorized four spatial positions that formed the endpoints of two objects (one of which was cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation. In contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions.

      The study by Peters et al. directly examined differences in probe positions, clearly demonstrating that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, but it hardly clarifies the rhythmicity of working memory performance. Whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner. This is also some new attempts in the research perspective and method of this study.

      The corresponding discussion has been added to the revised part of the paper

      “Similar to the present study, Peters et al. had participants memorize four spatial positions forming the endpoints of two objects (one cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation; in contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions—while Peters et al.’s study, which directly examined differences in probe positions, clearly demonstrates that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, it hardly clarifies the rhythmicity of working memory performance, whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner.”

      Reviewer #2 (Public review):

      The information provided in the current version of the manuscript is not sufficient to assess the scientific significance of the study.

      thank you very much for pointing out the multiple issues in our manuscript. Due to several revisions of this work, including experimental adjustments, there have been some inconsistencies in details. We appreciate you identifying them one by one.  We have made corresponding revisions based on your comments:

      (1) In many cases, the details of the experiments or behavioral tasks described in the main text are not consistent with those provided in the Materials and Methods section. Below, I list only a few of these discrepancies as examples:

      a) For Experiment 1, the Methods section states that the detection stimulus was presented for 2000 ms (lines 494 and 498), but Figure 1 in the main text indicates a duration of 1500 ms.

      we greatly appreciate you catching this inconsistency. We have made unified revisions by referring to the final implemented experimental procedures.  Corresponding revisions have been made in the paper:

      b) For Experiment 2, not only is the range of SOAs mentioned in the Methods section inconsistent with that shown in the main text and the corresponding figure, but the task design also differs between sections.

      Thank you for bringing this discrepancy to our attention. We have made unified revisions by referring to the final implemented experimental procedures. The correct SOAs are 233:33:867 ms.

      Corresponding revisions have been made in the paper:

      c) For Experiment 3, the main text indicates that EEG recordings were conducted, but in the Methods section, the EEG recording appears to have been part of Experiment 2 (lines 538-540).

      we’re grateful for you noticing this mix-up. In fact, only Experiment 3 is an EEG experiment, and we have made corresponding corrections in the "Methods" section. Corresponding revisions have been made in the paper: “The remaining components after this process were then projected back into the channel space. We extracted data from -500 ms to 2000 ms relative to cue stimulus presentation in Experiment 3.”  

      (2) The results described in the text often do not match what is shown in the corresponding figure. For example:

      a) In lines 171-178, the SOAs at which a significant difference was found between the two conditions do not appear to match those shown in Figure 2A.

      Many thanks for spotting this error. The previous results missed one SOA time, namely 33 ms, leading to a 33 ms difference in time. We have corrected it in the revised manuscript.

      Corresponding revisions have been made in the paper:“Specifically, the capture effect of cued items was significantly greater than that of uncued items at SOAs of 267ms (t(24) = 2.72, p = 0.03, Cohen's d = 1.11), 667ms (t(24) = 2.37, p = 0.03, Cohen's d= 0.97) and 833ms (t(24) = 3.53, p = 0.002, Cohen's d = 1.44), while the capture effect of uncued items was significantly greater than that of cued items at SOAs of 333ms (t(24) = 2.97, p = 0.007, Cohen's d = 1.21), 367ms (t(24) = 2.14, p = 0.04, Cohen's d = 0.87), 433ms (t(24 )= 2.49, p = 0.02, Cohen's d = 1.02), 467ms (t(24)=2.37, p = 0.03, Cohen's d = 0.97) and 567ms (t(24)=2.72, p = 0.02, Cohen's d = 1.11). ”

      (b) In Figure 4, the figure legend (lines 225-228) does not correspond to the content shown in the figure.

      we appreciate you pointing out this oversight. When adjusting the color scheme during the revision of the manuscript, we neglected to revise the legend, which has now been corrected in the revised manuscript.

      Corresponding revisions have been made in the paper:“Figure 4. The red line represents the average across all participants of the Fourier transforms of the differences in capture effects between left and right memory items at the individual level. The gray area represents values below the group average of medians derived from 1000 permutations, with each permutation involving Fourier transforms for each participant. *: p < 0.05.”

      (c) In Figure 9, not sufficient information is provided within the figure or in the text, making it difficult to understand. Consequently, the results described in the text cannot be clearly linked to the figure.

      Thank you for drawing our attention to this issue. We have revised Figure 9 and its legend in the revised manuscript to make them clearer and easier to understand.

      Corresponding revisions have been made in the paper

      (3) Insufficient information is provided regarding the data analysis procedures, particularly the permutation tests used for the data presented in Figures 2B, 4, and 10. The results shown in these figures are critical for the main conclusions drawn in the manuscript.

      we’re thankful for you highlighting this gap. In the revised manuscript, we have provided a more detailed explanation in the "Methods" section, especially regarding the content related to frequency analysis, to make the expression clearer.

      Corresponding revisions have been made in the paper:“As shown in Figure 8, the alpha power (8-14 Hz) induced by cued and uncued items alternated in dominance during the memory retention phase. To quantify this rhythmic alternation, we conducted a spectral analysis following these steps: First, we computed the power difference between cued and uncued items within the 8-14 Hz range during the retention phase. These differences were then downsampled to 100 Hz using a 10 ms window for averaging, generating a one-dimensional time series spanning the 0-2000 ms retention period. This time series was subsequently subjected to amplitude spectrum analysis across frequencies from 1 Hz to 50 Hz using Fourier transformation.

      To assess the statistical significance of the observed spectral features, we employed a permutation test. Specifically, we randomly shuffled the temporal order of the time series of power differences between cued and uncued items—thereby preserving the amplitude distribution of the data while eliminating temporal correlations in the original sequence—and repeated the Fourier transform and spectral analysis for each shuffled time series. This permutation process was replicated 1000 times to generate a null distribution of spectral power values. A frequency component in the original data was considered statistically significant if its power ranked within the top 5% of the corresponding null distribution (p < 0.05).

      We applied the same analytical pipeline to investigate differences in the weighted phase-lag index (wPLI) between the contralateral regions of the two items and the prefrontal cortex during the retention phase. Specifically, wPLI differences (i.e., the difference between the two conditions) were computed, downsampled to 100 Hz using a 10 ms window for averaging to generate a time series spanning 0-2000 ms, and then subjected to amplitude spectrum analysis (1-50 Hz) using Fourier transformation. Significance was assessed via the identical permutation test procedure described above (randomly shuffling the temporal order of the difference time series).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Can the authors offer a hypothesis as to how decreased coactivity promotes increased movement velocity.” 

      In our revision we have added an additional metric measuring how spatial coactivity changes during movement onset, the spatial correlation index, which replicates a previous finding that co-activity among proximal neurons is statistically greater surrounding movement onset. We did not find, as outlined in the revision, that mGluR5 manipulations significantly altered this relationship. Our data therefore shows, consistent with that shown previously, that ensembles of dSPNs that are co-active during movement onset, in particular ambulatory movement, are more likely to contain neurons that are closer together and the neurons are highly active. In contrast, rest ensembles contain neurons that are less active but have more highly correlated activity, across all pairwise distances. Additionally, mGluR5 inhibition, genetic or pharmacological, promotes the activation of rest ensembles but does not affect the properties of movement ensembles. Previous studies (e.g. Klaus A. et al., 2017) have shown that neurons in rest ensembles are, in general, unlikely to also be members of movement ensembles, We therefore hypothesize that corticostriatal synapses onto SPNs of rest ensembles are more likely, during spontaneous behavior, to have reduced synaptic weight due to mGluR5 signaling, potentially due to eCB mediated inhibition of neurotransmitter release. Therefore, when we inhibit mGluR5 at these synapses, we increase synaptic weight and increase the probability of activation of this coordinated rest ensemble, which suppresses movement. If, on the other hand, the synapses that govern activation of neurons in movement ensembles have a higher weight, they may be unaffected by mGluR5 inhibition. 

      The use of the Jaccard similarity index in this study is not intuitive and not fully explained by the methods or the diagram in Figure 1. 

      We have added more detail to the paper to explain the methodology of the jaccard similarity measure. The advantage of this method is that is specifically captures cells that are jointly active, as opposed to jointly inactive and is therefore useful for capturing co-activity in our sparsely active Ca<sup>2+</sup> imaging data. 

      The analysis of a possible 2-AG role in the mGlu5 mediated processes is incomplete. 

      We agree that, as an experiment to outline which endocannabinoids are involved in modulating synaptic strength through mGluR5, this experiment alone is not sufficient.

      However, our main focus in this paper is how manipulations of mGluR5 affect the spatiotemporal dynamics of dSPNs and we chose not to focus on specific mechanisms of endocannabinoid signaling, though these would certainly be interesting to investigate further in vivo.

      It would seem to be a simple experiment to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice. 

      We agree that this experiment would be valuable and extend our findings presented in the paper, however, it has practically been outside the scope of the current work. 

      Reviewer #2 (Public review):

      Pharmacological and genetic manipulations of mGluR5 do not differentially/preferentially modulate the activity of proximal vs distal dSPNs, therefore, it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity as opposed to differential proximal/distal spatial relationships. 

      As in the response to reviewer 1 above, we have added additional clarification to the text explaining that our manipulations do not differentially affect the co-activity of proximal vs distal dSPNs, this is also quantified throughout the text using the spatial coordination index. However, we disagree that “it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity” as we do not observe statistically significant changes in the event rate following either pharmacological or genetic manipulations of mGluR5. Rather, we consistently observe statistically significant changes in co-activity among neurons, the extent to which activity of active neurons during either rest or movement are correlated with each other. This is the central finding of our manuscript, inhibiting or potentiating mGluR5 signaling alters behavior, not by blanket suppression or enhancement of the activity as measured using the event rate, of dSPNs, but by affecting their ensemble dynamic properties.  Co-activity during rest versus ambulatory movement is statistically greater in both proximal and distal cells and inhibiting mGluR5 increases this co-activity and decreases movement. 

      For these analyses of prox vs distal and all others, please include the detail of how many proximal vs distal cells were involved and per subject. 

      We have added a supplemental table that details the number of cells included per subject in all analyses

      Ln. 151-152: Please provide data concerning how volumes of infectivity differ between injecting AAV vs. coating the lens? If these numbers are very different, this could impact the number of Jaccard pairings and bias results. 

      While viral injection may lead to a larger volume of expression, with this one photon imaging method only those cells within ~200 microns of the edge of the lens will be able to be resolved, therefore practically, if there is an additional volume of infected tissue outside of the field of view of the lens, it would not affect the results as these neurons will not be resolved by the endoscope camera. Accordingly, the average number of cells detected per session is very similar following each approach (mean # of cells per session with coating 90.93 ± 23.69 cells, with viral injection 90.03 ± 29.29 cells)

      Is mGluR5 affecting dSPN activity in other measures beyond co-activity and rate? Does the amplitude of events change?

      We have added supplemental data for figures 2, 3, and 5 demonstrating that manipulations of mGluR5 do not affect the amplitude or length of Ca<sup>2+</sup> events included in the analysis. 

      What is the model of mGluR5 signaling in a resting state vs. movement? What other behaviors are occurring when the mouse is in a low velocity "resting state" (0-0.5 cm/s). If this includes other forms of movement (i.e. rearing, grooming) then the animal really isn't in a resting state. This is not mentioned in the open field behavior section of the methods and should be described (Ln. 486) in addition to greater explanation of what behavior measures were obtained from the video tracking software (only locomotion?)

      It would be very interesting to determine if during “rest,” when the animals is not engaged in ambulatory behavior, it may be engaged in some fine motor behavior. However, the resolution of the cameras used to measure locomotor activity in this dataset does not allow us to do this. 

      There is large variability in co-activity in proximal dSPNs when animals are "resting" (2j). Could this be explained by different behavior states within your definition of "rest"?

      We agree that if the animal is engaging in fine motor behavior that we cannot resolve with our behavior setup, this could produce some variability in coactivity. However, as shown previously (e.g. Klaus A. et al., 2017), ensembles active when the animal is not moving (our definition of “resting”), regardless of additional fine motor behaviors the animal may be engaged in when not moving, are substantially different that those ensembles that are active when the animal is moving. We therefore expect that this may limit, although potentially not eliminate, variability due to different behavioral states we may have grouped into our “resting” category. Unfortunately, as mentioned above, we are not able resolve variations in fine motor output in this behavioral data. 

      Have you performed IHC, ISH or another measure to validate D1 cell specific cKO?

      The mGluR5<sup>loxP/loxP</sup> mice used in this study were characterized previously by our lab (Xu et al., 2009), we used the same mice here with a different, but also published and characterized Cre-driver line, Drd1a-Cre Ey262 (Gerfen et al., 2013).

      Why are the "Mean Norm Co-activity" values in 5e so high in this experiment relative to figures 2-4?  

      In experiments where we treated the same animal with vehicle and a drug (i.e., experiments in Figure 2 and 3), we normalized the values for each animal in the drug treatment group to the distal bin of that animal following vehicle treatment. This allowed us to more clearly resolve the changes within each animal due to drug treatment. As comparisons in the data in figure 5 d–f are between different animals (rather than different treatments of the same animal) we could not perform this normalization procedure.  

      Reviewer #3 (Public review):

      Some D1 Cre lines have expression in the cortex. Which specific Cre line was used in this study? 

      We used, Drd1a-Cre Ey262. This is included in methods. 

      The text says JNJ treatment .... increased locomotor speed (Figure 3b) and increased the duration but not frequency of movement bouts (Figure 3c, d). However, the statistics of the figure legends say: however the change in mean velocity (3b) is not significant (p=0.060, U=3, Mann-Whitney U test), nor is the mean bout length during vehicle and JNJ (p=0.060, U=3, Mann-Whitney U test) (3d) Comparison of mean number of bouts of each animal during vehicle and JNJ (p=0.403, U=8, Mann-Whitney U test). 

      This has been corrected to indicate only the change in time spend at rest is statistically significant.

      This effect was most pronounced during periods of rest (Figure 3i, j). The decrease was only in rest? Are the colors in Figure 3J inverted? Therefore, JNJ treatment had effects that were qualitatively the inverse to the effects of fenobam on locomotion and dSPN activity. 

      We have corrected the text to state that, overall, and during periods of rest but not movement, JNJ had effects that were qualitatively the opposite of fenobam.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary: 

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surroundings of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow, controls the migration of newborn interneurons towards the olfactory bulbs. 

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time-lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons rely on factors released by selective types of blood vessels. 

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments. 

      We thank the reviewer for the thoughtful evaluation and constructive comments outlined in the “Recommendations for The Authors”. In response, we have incorporated additional data, revised relevant figures, and clarified explanations in the revised manuscript.

      Reviewer #2 (Public review)

      Summary: 

      The authors establish a close spatial relationship between RMS neurons and blood vessels. They demonstrated that high blood flow was correlated with migratory speed. In vitro, they demonstrate that Ghrelin functions as a motogen that increases migratory speed through augmentation of actin cup formation. The authors proceed to demonstrate through the knockdown of the Ghrelin receptor that fewer RMS neurons reach the OB.

      They show the opposite is true when the animal is fasted. 

      Strengths: 

      Compelling evidence of close association of RMS neurons with blood vessels (tissue clearing 3D), preferentially arterioles. Good use of 2-photon imaging to demonstrate migratory speed and its correlation with blood flow. In vitro analysis of Ghrelin administration to cultured RMS neurons, actin visualization, Ghsr1KD, is solid and compelling. 

      We sincerely thank the reviewer for the encouraging comments and helpful suggestions. As noted, our original manuscript lacked sufficient in vivo evidence connecting blood flow with ghrelin signaling. To address this, we have added new data and revised the explanations throughout the manuscript as described below.

      Weaknesses: 

      (1) Novelty of findings attenuated due to prior work, especially Li et al., Experimental Neurology 2014. Here, the authors demonstrated that Ghrelin enhances migration in adultborn neurons in the SVZ and RMS. 

      We agree with the reviewer that the idea that ghrelin enhances migration of new neurons is not entirely novel. The study by Li et al. (2014) provided critical insights that guided our investigation into ghrelin as a blood-derived factor promoting neuronal migration. However, our study expands on this by demonstrating that ghrelin directly stimulates migration via GHSR1a in cultured new neurons, and we further identified the cellular and cytoskeletal mechanisms involved. Specifically, we showed that ghrelin enhances somal translocation by activating actin dynamics at the rear of the cell soma. We have revised the Results and Discussion sections accordingly to emphasize these novel aspects as follows:

      “A previous study demonstrated that the migration of V-SVZ-derived new neurons was attenuated in ghrelin knockout mice (Li et al., 2014). In our study, we found that the migration of cultured new neurons was enhanced by the application of ghrelin to the culture medium, and this effect was abolished by Ghsr1a knockdown (KD). These findings suggest that ghrelin directly stimulates neuronal migration through its receptor, GHSR1a, on new neurons. A previous study showed that GHSR1a is expressed in various regions of the brain (Zigman et al., 2006). In our experiments, new neuron-specific KD of Ghsr1a indicated that ghrelin signaling acts in a cell-autonomous manner to regulate neuronal migration.” (Discussion, page 13, lines 10–18)

      “Furthermore, we identified the cellular and cytoskeletal mechanisms underlying this effect on migration. The results indicate that ghrelin enhances somal translocation during migration by activating actin cytoskeletal dynamics at the rear of the neuronal soma.” (Discussion, page 13, lines 24–26)

      (2) The evidence for blood delivery of Ghrelin is not very convincing. Fluorescently-labeled Ghrelin appears to be found throughout the brain parenchyma, irrespective of the distance from vessels. It is also not clear from the data whether there is a link between increased blood flow and Ghrelin delivery. 

      We agree that the correlation between blood flow and ghrelin transcytosis is not very convincing in our study. As the reviewer pointed out, Figure 3A gives the impression that fluorescent-labeled ghrelin is uniformly distributed throughout the brain parenchyma. However, high-magnification images newly added in Figure 3 show that some, but not all, vessels have particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal side of vascular endothelial cells, visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). To quantify these observations, we defined two regions: Area I (perivascular area), within 10 μm of the abluminal surface of CD31-positive endothelium; and Area II (distant area), located 10–20 μm away (Figure 3E). Of note, Area I corresponds to the perivascular region where new neurons are frequently observed (Figure 1).

      Importantly, we found strong ghrelin signals in vascular endothelial cells of endomucin-negative high-flow vessels (Figure 3C, D). This suggests that transcytosis of blood-derived ghrelin may occur more frequently in high-flow vessels due to increased endocytosis at the endothelium. To test this, we quantified signal gradients in the extra-vessel regions as fold changes (Area I / Area II), as illustrated in Figure 3E. The proportion of vessel segments with >1.5-fold increases was significantly higher in endomucin-negative vessels than in endomucin-positive ones (Figure 3F). Furthermore, vessels with >2-fold increases were observed exclusively in the endomucinnegative group (6.48% ± 1.18%). 

      These data suggest that, in high-flow vessels, blood-derived ghrelin accumulates more in the immediate perivascular region than in areas further away. This supports the possibility that elevated blood flow delivers a larger amount of ghrelin to the vascular endothelium, enhancing its transcytosis into adjacent brain parenchyma. This mechanism may underlie the preferential migration of new neurons along perivascular regions with high blood flow, as shown in Figure 1.  We have incorporated this new data in Figure 3 and corresponding explanations into the Results, Figure legend and Methods

      (3) The in vivo link between Ghsr1KD and migratory speed is not established. Given the strong work to open the study on blood flow and migratory speed and the in vitro evidence that migratory speed is augmented by Ghrelin, the paper would be much stronger with direct measurement of migration speed upon Ghsr1KD. Indeed, blood flow should also be measured in this experiment since it would address concerns in 2. If blood flow and ghrelin delivery are linked, one would expect that Ghsr1KD neurons would not exhibit increased migratory speed when associated with slow or fast blood flow vessels. 

      In Figure 3, we showed that ghrelin transcytosis occurs preferentially in high-flow vessels, suggesting a role for ghrelin in mediating the effects of blood flow on neuronal migration. However, whether this dependence is solely attributable to ghrelin signaling remains unclear. 

      To address this, we tested whether Ghsr1a-KD modifies the impact of reduced blood  flow on neuronal migration by combining Ghsr1a-KD with bilateral common carotid artery stenosis (BCAS), a chronic cerebral hypoperfusion model (Figure S9A). We found that BCAS decreased the percentage of Ghsr1a-KD new neurons reaching the OB, similar to the effect seen in control neurons (Figure S9B, see also Figure 2A–C). This suggests that blood flow influences neuronal migration even under Ghsr1a-KD conditions. 

      Furthermore, we analyzed the distribution of Ghsr1a-KD neurons with respect to vessel flow characteristics. Even under Ghsr1a-KD, a higher proportion of new neurons were located in the area of endomucin-negative (high-flow) vessels compared with endomucin-positive (low-flow) vessels (Figure S9C), indicating that Ghsr1a-KD does not abolish the preferential association of migrating neurons with high flow vessels. These findings suggest that although ghrelin signaling contributes to blood flow-dependent migration, it is not the sole factor. Other blood-derived signals may also mediate this effect. We have included these new data in Figure S9 and updated the corresponding sections in the Results

      Reviewer #1 (Recommendations for the authors) :

      Major 

      Page 6, Line 13. Please provide in the result section some explanation about how photothrombic clot is induced.  

      We added the following explanation to the Results section to clarify the method used to induce photothrombotic clot formation.

      “For clot formation, a restricted area of selected vessels was irradiated by a two-photon laser immediately after intravenous injection of rose bengal.” (Results, Page 7, lines 27–28)

      Page 6, Line 18. The authors use the marmoset as an additional experimental model. Here, V-SVZ-derived newborn neurons migrate in other brain regions as compared to rodents. Please provide a clear rationale for moving from rodents to "common marmosets" as an experiment model. And why use marmosets only for this set of experiments? 

      We clarified the rationale for using common marmosets in addition to mice as follows:

      “Because blood vessel-guided neuronal migration in the adult brain is a conserved phenomenon across species (Kishimoto et al., 2011; Akter et al., 2021; Shvedov et al., 2024), we hypothesized that blood flow may also influence neuronal migration in other brain regions of primates. The neocortex, which supports higher-order brain functions and has undergone evolutionary expansion in primates, was selected as a target region. In common marmosets, but not in mice, V-SVZ-derived new neurons migrate toward the neocortex and ventral striatum (Akter et al., 2021) (Supplemental Movies S4 and S5).” (Results, Page 6, lines 19–25)

      Figure 2B. The experimental setup is possibly problematic as the lentiviral tracing measurement does not take into consideration the rate of neurogenesis or newborn neuron survival. Can authors assess the rate of proliferation and survival in the VSVZ/RMS upon BCAS to decipher whether the reduced number of cells observed in the OB only results from migration changes? (comparable remark stands for Figure 5) 

      To evaluate whether the reduction in the number of new neurons observed in the OB after BCAS (Figure 2B, C) is due solely to impaired migration, we assessed cell proliferation and survival in the V-SVZ and RMS. Specifically, we quantified the density of Ki67+ proliferating cells and cleaved caspase-3+ apoptotic cells in the sham and BCAS groups. BCAS significantly decreased cell proliferation and increased cell death in both the V-SVZ and RMS (Figure S4), suggesting that reduced neurogenesis and/or survival may contribute to the decreased neuronal distribution in the OB. 

      Although we cannot exclude the possibility that changes in cell proliferation or survival contributed to this effect, our photothrombotic clot formation experiments are better suited to directly examine how acute reduction in blood flow affects neuronal migration. These experiments allowed us to measure the migration speed of new neurons shortly after inducing localized blood flow inhibition. We found that clot formation significantly reduced the migration speed of new neurons (Figure 2E, H), indicating that blood flow changes directly impair neuronal migration in the adult brain. 

      We have included these new data in Figure S4 and updated the corresponding text in the Results, Discussion, Figure legend, and Methods as follows:

      Figure 3. About ghrelin signaling. It is unclear whether its transcytosis occurs in endomucin-negative because of the high bloodstream flow. How can this be explained? What happens upon BCAS, is there still a close relation between ghrelin transcytosis, blood flow, and neuron migration? 

      As correctly noted, our initial explanation and data did not provide sufficient evidence that higher blood flow delivers a larger amount of ghrelin into the brain parenchyma. We found that some vessels had particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal surface of vascular endothelial cells, as visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). On the basis of our observation that strong fluorescent signals were detected in vascular endothelial cells of endomucin-negative (high-flow) vessels (Figure 3C, D), we hypothesized that ghrelin transcytosis may occur more frequently in high-flow vessels due to increased endocytosis at the vessel endothelium. 

      To test this hypothesis, we quantified signal gradients in the extra-vessel regions by calculating fold changes in fluorescent intensity between two zones: Area I (0–10 μm from the abluminal surface of the endothelium) and Area II (10–20 μm away), as illustrated in Figure 3E. Area I corresponds to the perivascular region where new neurons are frequently found (Figure 1). We found that the proportion of vessel segments with >1.5-fold signal increase in Area I relative to Area II was significantly higher in endomucin-negative vessels than endomucin-positive ones (Figure 3F). Furthermore, vessel segments with >2-fold increases were observed exclusively in the endomucin-negative group (6.48% ± 1.18%). These results support the idea that higher blood flow increases the amount of ghrelin that reaches the luminal surface of vascular endothelial cells, thereby increasing the possibility of ghrelin transcytosis into the brain parenchyma.

      We also examined whether blood flow inhibition–induced by BCAS or photothrombotic clot formation–affects the relationship between ghrelin transcytosis, blood flow, and neuronal migration. The above results suggest that blood flow reduction may decrease ghrelin transcytosis, thereby contributing to impaired neuronal migration. To further explore this, we analyzed the distribution of new neurons around high- versus low-flow vessels under BCAS conditions. In the BCAS group, we still observed a higher density of new neurons in the region of high-flow (endomucin-negative) vessels compared with in low-flow (endomucin-positive) ones (Figure S9C). This suggests that even under reduced blood flow, neuronal migration preferentially occurs near high-flow vessels. Taken together, these results suggest that ghrelin transcytosis, blood flow and neuronal migration are connected, and that this relationship persists under conditions of blood flow reduction.

      Figure 4. Is ghrelin controlling both individual Dcx+ neuron migration as well as chain migration (cells moving more together)? This should be assessed and clarified. 

      How is ghrelin controlling actin dynamics in newborn migrating neurons? Since somal translocation speed and somal stride length are both modulated by ghrelin, this factor may also control MT remodeling, could that be checked? 

      We have revised the manuscript to better explain the role of ghrelin in both modes of neuronal migration–chain and individual. Initially, we demonstrated that ghrelin enhances the migration of new neurons in V-SVZ culture (Figure 4A, B), where these neurons migrate outward as chains, indicating that ghrelin facilitates chain migration. In subsequent in vitro experiments (Figure 4C–M), we showed that ghrelin also enhances the migration of individual neurons. To examine this in vivo, we injected Ghsr1a-KD and control lentiviruses into two different anatomical regions: the V-SVZ, where chain migration originates, and the OB core, where new neurons migrate individually. These experiments enabled us to assess the role of ghrelin signaling in each mode of migration independently. We found that ghrelin enhanced both chain migration in the RMS and individual migration in the OB. These results indicate that ghrelin signaling facilitates both forms of neuronal migration. We added the following text in the Results section:

      “To assess the direct effect of ghrelin on neuronal migration, we applied recombinant ghrelin to V-SVZ cultures, in which new neurons emerge and migrate as chains (Figure 4A). Ghrelin significantly increased the migration distance of these neurons (Figure 4B), indicating enhanced chain migration. We then used super-resolution time-lapse imaging to examine individually migrating neurons with or without knockdown (KD) of growth hormone secretagogue receptor 1a (GHSR1a), a ghrelin receptor expressed in V-SVZ-derived new neurons (Li et al., 2014) (Figure 4C). Ghrelin enhanced the migration speed of control cells (lacZ-KD) cells, indicating that it also facilitates individual migration (Figure 4D).” (Results, Page 9, lines 5–12)

      “Of the total labeled Dcx+ cells, the percentage of Dcx+ cells reaching the GL was significantly lower in the Ghsr1a-KD group than in the control group (Figure 5B, C), suggesting that ghrelin enhances individual radial migration of new neurons in the OB.” (Results, Page 10, lines 5–8) “These data indicate that ghrelin signaling facilitates both individual migration in the OB and chain migration in the RMS.” (Results, Page 10, lines 17–18)

      We also added discussion on how ghrelin may regulate cytoskeletal dynamics in migrating neurons. Ghrelin signaling has been reported to control actin cytoskeletal remodeling in astrocytoma cells (Dixit et al., 2006), which led us to investigate similar effects in migrating neurons. Rac, a member of the Rho GTPase family, was shown to mediate this actin remodeling in astrocytoma migration, suggesting it may also be involved in ghrelin-induced actin cup formation in new neurons. Furthermore, because somal translocation depends not only on actin but also on microtubule dynamics (Kaneko et al., 2017), it is possible that ghrelin influences both systems. Supporting this idea, ghrelin signaling was shown to modulate microtubule behavior via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017). These findings suggest that ghrelin may enhance somal translocation through coordinated regulation of both the actin and microtubule systems. We added following text in the Results and Discussion sections:

      “Ghrelin signaling has been reported to regulate actin cytoskeletal dynamics in astrocytoma cells (Dixit et al., 2006), which led us to examine whether a similar mechanism operates in migrating neurons.”(Results, Page 9, lines 23–25)

      “Further studies are needed to elucidate how ghrelin promotes actin cup formation in migrating neurons. Given that Rac, a Rho family GTPase, mediates actin remodeling downstream of ghrelin in astrocytoma cells (Dixit et al., 2006), it is possible that Rac may also be involved in ghrelininduced cytoskeletal regulation in new neurons.” (Discussion, Page 13, lines 28–31)

      “In addition to actin remodeling, ghrelin may regulate microtubule dynamics. Ghrelin signaling was shown to modulate microtubules via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017), raising the possibility that ghrelin promotes somal translocation of new neurons through coordinated regulation of both actin and microtubule networks (Kaneko et al., 2017).” (Discussion, Page 13, line 31–Page 14, line 2)

      It would also be informative to provide immunolabeling of Ghsr1 in the V-SVZ / RMS/ OB to have a clear picture of the expression pattern of this receptor. Newborn neurons migrate along blood vessels, which are surrounded by astrocytes that have also been reported to express Ghsr1, thus could newborn neuron migration change may also arise from activation of Ghsr1 in their surrounding astrocytes? 

      A previous study reported that GHSR1a is expressed in DCX+ new neurons in the RMS and OB, and in V-SVZ neural progenitor cells (Li et al., 2014). To visualize the spatial expression pattern of Ghsr1a, we performed RNAscope in situ hybridization because specific anti-GHSR1a antibodies suitable for immunohistochemistry were not available. Consistent with the previous report, we detected Ghsr1a mRNA in DCX+ new neurons in the VSVZ, RMS, and OB (Figure S5A), indicating that new neurons directly receive ghrelin signaling. 

      Moreover, our KD experiments demonstrated that ghrelin enhanced the migration of new neurons in a cell-autonomous manner via GHSR1a (Figure 4, 5). Nevertheless, a recent study (Stark et al., 2024) showed that GHSR1a was expressed in various cell types, including glutamatergic and GABAergic neurons, suggesting that ghrelin may also exert non-cellautonomous effects on neuronal migration. Given the presence of diverse cell types, including neurons, microglia, pericytes, and astrocytes, along the migratory route, it remains possible that GHSR1a activation in these neighboring cells contributes to the overall regulation of neuronal migration. 

      Figure 5. About the in vivo knockdown of Ghsr1a. The results section (page 9, line 3) mentioned that mice were either injected with one or the other construct but Figure 5 shows coincidence of GFP and dsRed positive cells. Were control and Ghsr1a shRNAs injected together into the same mouse? Could you quantify the number of cells in green (control), red (Ghsr1a KD), and yellow (both)? Won't they mostly be yellow? Have you tried injecting control and Ghsr1a separately? If yes, do you get the same result? Such analysis would be important to separate cell autonomous from noncell autonomous effects. 

      To minimize variability in injection conditions, we initially coinjected control and Ghsr1a-KD lentiviruses into the same mice and analyzed their migration using a paired design. As the reviewer correctly noted, some cells were coinfected and expressed both EmGFP and DsRed (18.7% ± 2.86% of EmGFP+ cells and 10.8% ± 0.533% of DsRed+ cells). To ensure that this overlap did not affect our analysis, we excluded EmGFP+/DsRed+ double-positive cells and focused solely on EmGFP+/DsRed− (control) and EmGFP−/DsRed+ (Ghsr1a-KD) single-positive cells. 

      We agree with the reviewer that coinjection could lead to reciprocal interactions between control and Ghsr1a-KD cells, potentially masking cell-autonomous effects. To address this, we performed an independent experiment in which control and Ghsr1a-KD lentiviruses were injected separately into different mice (Figure S7A), as suggested. Consistent with the results of the coinjection experiment, we found that the Ghsr1a-KD cells showed significantly reduced distribution in the GL compared with that in control cells (Figure S7B). Although we cannot exclude the possibility of a non-cell-autonomous effect of ghrelin, this result supports the conclusion that ghrelin signaling enhances neuronal migration in a cell-autonomous manner. 

      Who is expressing Ghsr1a, newborn neurons, and or their progenitors? The production and survival of newborn V-ZVS cells should be assessed upon knockdown of the ghrelin receptor too. 

      To determine whether the altered distribution of new neurons observed upon Ghsr1aKD is due to impaired migration rather than decreased cell production or survival, we examined the effects of Ghsr1a-KD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). 

      We compared the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS between the control and Ghsr1a-KD groups. There was no significant difference in the proportion of cleaved caspase-3+ cells between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not affect the survival of new neurons and their progenitors. 

      Similarly, the proportion of Ki67+ cells in the RMS did not differ significantly between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, it remains technically difficult to evaluate whether Ghsr1a-KD affects proliferation in the VSVZ, because lentivirus injection into the VSVZ may interfere with GHSR1a expression not only in new neurons and neural progenitors, but also in other cell types known to express GHSR1a (Zigman et al., 2006). A previous study reported that ghrelin signaling promoted cell proliferation in the V-SVZ (Li et al., 2014), thus we cannot exclude the possibility that Ghsr1a-KD may affect V-SVZ proliferation.

      To overcome this limitation, we assessed the effects of Ghsr1a-KD on neuronal migration using in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), both of which did not interfere with proliferation in the V-SVZ. These complementary approaches consistently demonstrated that Ghsr1a-KD reduces the migration speed of new neurons. 

      “To determine whether the altered distribution of new neurons after Ghsr1a-KD is due to impaired migration rather than changes in cell production or survival, we assessed the effects of Ghsr1aKD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). We quantified the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS in both control and Ghsr1a-KD groups. We found no significant difference in cleaved caspase-3+ cell proportions between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not influence the survival of new neurons and their progenitors. Similarly, the percentage of Ki67+ cells in the RMS was similar between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, technical limitations prevented a reliable evaluation of proliferation in the V-SVZ, as lentivirus injection into this region may interfere with GHSR1a expression in not only neural progenitors and new neurons, but also other GHSR1aexpressing cell types (Zigman et al., 2006). Although ghrelin signaling has been reported to promote cell proliferation in the V-SVZ (Li et al., 2014), our complementary in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), which did not affect the V-SVZ, consistently demonstrated that Ghsr1a-KD reduces neuronal migration. Taken together, our results suggest that blood-derived ghrelin enhances neuronal migration in the RMS and OB by stimulating actin cytoskeleton contraction in the cell soma, rather than by altering cell proliferation or survival.” (Results, Page 10, line 19–Page 11, line 4)

      “rat anti-Ki67 (1:500, #14-5698-82, eBioscience); and rabbit anti-cleaved caspase-3 (1:200, #9661, Cell Signaling Technology)” (Methods, Page 48, lines 14–16)

      How much is ghrelin/Ghsr1 signaling conserved in marmosets? 

      How ghrelin signaling is conserved between mice and common marmosets is important to clarify. A previous study reported the existence of a ghrelin homolog in common marmoset, which shares high sequence similarity with that in mice (Takemi et al., 2016). Moreover, the GHSR1a homolog in the common marmoset (https://www.ncbi.nlm.nih.gov/protein/380748978) shares 95.36% amino acid identity with its mouse counterpart. These findings suggest that blood-derived ghrelin may similarly promote neuronal migration in the marmoset brain, as observed in mice. 

      We have added the following text in the Discussion section:

      “Our data showed that new neurons preferentially migrate along arteriole-side vessels rather than venule-side vessels in both mouse and common marmoset brains, suggesting that the mechanism of blood flow-dependent neuronal migration is conserved across rodent and primate species, as well as across brain regions. A previous study identified a ghrelin homolog in the common marmoset with high sequence similarity to the murine version (Takemi et al., 2016). In addition, the marmoset GHSR1a homolog shares 95.36% amino acid identity with that of the mouse (https://www.ncbi.nlm.nih.gov/protein/380748978). These findings suggest that bloodderived ghrelin promotes neuronal migration in the common marmoset brain in a manner similar to that in mice.” (Discussion, Page 15, lines 8–16)

      Page 9. Starvation has been shown to boost ghrelin blood levels. What is the exact protocol used in this experiment and is this indeed increasing Ghrelin release from blood vessels in the V-SVZ? What about Ghsr1 expression level in newborn neurons? 

      We have clarified the calorie restriction (CR) protocol used in our experiments. We adopted a 70% CR protocol, which was previously shown to enhance hippocampal neurogenesis when administered for 14 days (Hornsby et al., 2016). In our study, the daily food intake under ad libitum (AL) conditions was first measured, and CR mice were then fed 70% of that amount for 5 consecutive days (see Figure 5I and Figure S10A). 

      To assess whether CR enhances ghrelin transcytosis into the brain parenchyma, we performed ELISA to quantify ghrelin levels in the OB and RMS. However, ghrelin concentrations were below the detection limit in both groups, precluding a direct comparison.

      We also considered whether CR modulates the expression level of the ghrelin receptor GHSR1a. A recent study reported that fasting increased GHSR1a expression in the OB (Stark et al., 2024), raising the possibility that CR may exert a similar effect. To test this, we performed in situ hybridization and quantified Ghsr1a mRNA puncta in Dcx+ cells in the OB. No significant difference was found between the AL and CR groups (Figure S5B), suggesting that CR does not alter GHSR1a expression levels in new neurons. 

      Although we cannot exclude the possibility that CR increases GHSR1a expression in other OB cell types, our combined CR and Ghsr1a-KD experiments strongly support a cellautonomous contribution of ghrelin signaling to the enhanced neuronal migration observed under CR conditions. Corresponding data and text have been added to Figure S5 and the Results, Discussion, and the Figure legend sections as follows:

      Minor 

      Page 4 

      Line 19 In Supplemental movies 1 and 2, it is unclear where to see the GFP+ new neurons interact with BV. Can you add arrows as an indication for the readers? It will be better to add the anatomy term for orientation, caudal, or rostral in the video. (The same for Supplemental movies 3, 4, and 5).  

      To clarify the regions of interest in Supplemental Movies 1 and 2, where neuron–vessel interactions in the RMS are highlighted, we added dotted lines indicating the RMS boundaries. In addition, we created a new movie (Supplemental Movie S1′) showing a high-magnification view of Supplemental Movie S1, in which arrows mark EGFP+ new neurons interacting with blood vessels. We also added orientation indicators (e.g., caudal and rostral) and arrows to highlight new neuron–vessel interactions in Supplemental Movies S1–S5. 

      The following descriptions have been added to the Figure legends:

      “Supplemental Movie S1′ 

      High-magnification view extracted from Supplemental Movie S1. Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Page 46, lines 6–8)

      “Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Supplemental Movie S3, Page 46, lines 16–17)

      “Arrows indicate Dcx+ cells interacting with blood vessels.” (Figure legend, Supplemental Movies S4 and S5, Page 46, lines 21–22, 26–27)

      Blood vessels are labeled in the Supplemental movies 2 and 3 by employing Flt1DsRed transgenic mice instead of RITC-Dex-GMA. However, Flt1-DsRed transgenic mice are not mentioned in the results section. 

      We have now included an explanation regarding the use of Flt1-DsRed mice, in which vascular endothelial cells were labeled with DsRed.

      “To visualize blood vessels, we also used Flt1-DsRed transgenic mice, in which vascular endothelial cells were specifically labeled with DsRed (Matsumoto et al., 2012). Using DcxEGFP/Flt1-DsRed double transgenic mice, we observed close spatial relationships between new neurons and blood vessels (Supplemental Movies S2 and S3).” (Results, Page 4, lines 22– 26)

      Figure 5. Can you indicate (in the figure legend and the result section) the stage of the adult brain used for this experiment? 

      We used 6- to 12-week-old adult male mice in all experiments in this study. To specify this, we have added the age of animals to both the Results and the relevant Figure legends as follows:

      “Therefore, we first studied blood vessel-guided neuronal migration in the RMS and OB using three-dimensional imaging in 6- to 12-week-old adult mice, which enabled analysis of the in vivo spatial relationship between new neurons and blood vessels.” (Results, Page 4, lines 14–16)

      “Figure 1 New neurons migrate along blood vessels with abundant flow in the adult brain.” (Figure legend, Page 25, line 4)

      “(B, C) Three-dimensional reconstructed images of a new neuron (green) and blood vessels (red) in the rostral migratory stream (RMS) (B) and glomerular layer (GL) (C) of 6- to 12-weekold adult mice.” (Figure legend, Page 25, lines 6–8)

      “(E) Transmission electron microscopy image of a new neuron (green) in close contact with a blood vessel (red) in the GL of a 6- to 12-week-old adult mouse.” (Figure legend, Page 26, lines 4–5)

      “(F) Time-lapse images of a migrating neuron (indicated by asterisks) in the GL of a 6- to 12week-old Dcx-EGFP mouse.” (Figure legend, Page 26, lines 6–7)

      “Figure 3 Ghrelin is delivered from the bloodstream to the RMS and OB in the adult brain (A) Representative images of the OB and cortex of a fluorescent ghrelin-infused mouse (6 to 12 weeks old).” (Figure legend, Page 30, lines 1–3)

      “Lentivirus injection into the OB core (A) and the VSVZ (D) was performed in 6- to 12-week-old adult mice.” (Figure legend, Page 33, lines 3–4)

      Reviewer #2 (Recommendations for author):

      Major:

      Ghsr1KD and blood flow 2-photon experiments to directly measure migratory speed. Could also do the same with fasting with or without Ghsr1KD.  

      We thank the reviewer for the valuable suggestion to strengthen our study. As pointed out in the Public Review, we agree that direct in vivo measurement of neuronal migration speed under Ghsr1a-KD conditions is important to clarify the link between ghrelin signaling and blood flow. 

      Two-photon imaging is the most suitable method for this purpose. Although we attempted two-photon imaging of Ghsr1a-KD new neurons, the number of virus-infected cells observed in vivo was too low to yield reliable data. Therefore, we chose an alternative strategy, combining Ghsr1a-KD with blood flow reduction using the BCAS model (Figure S9A), in which migration speed can be quantified based on the percentage of labeled cells reaching the OB. As stated in the Public Review response, BCAS significantly decreased the migration speed of Ghsr1a-KD new neurons (Figure S9B), indicating that Ghsr1a-KD does not abolish the influence of blood flow reduction. These findings suggest that ghrelin signaling is involved, but is not essential, for blood flow-dependent neuronal migration. 

      As suggested by the reviewer, direct observation of migration dynamics (e.g., somal translocation, leading process extension, stationary and migratory phases) is needed, especially in calorie restriction experiments. Although our data indicate that ghrelin signaling is required for fasting-induced increases in migration speed of new neurons, calorie restriction could also change concentrations of other factors in blood (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which may independently affect behavior of migrating neurons. Given that ghrelin is not the sole factor contributing to blood flow-dependent neuronal migration, other circulating factors could affect behavior of migrating neurons in a different manner during fasting. In vivo twophoton imaging would be a powerful approach to determine whether fasting-induced neuronal migration is caused by upregulated somal translocation speed, which would further support a role for ghrelin in this process.

      We have added the following text in the Discussion:

      “Although our data indicate that ghrelin signaling is essential for fasting-induced acceleration of neuronal migration, calorie restriction may also alter the concentrations of other circulating factors (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which could independently influence the behavior of migrating neurons.” (Discussion, Page 14, lines 25–29)

      Minor: 

      (1) Show fluorescent Ghreliin in Figure 3 for all brain areas measured in Figure 1 (GL, EPL, GCL, and RMS) for direct comparison.  

      To allow for direct comparison across brain regions, we added a new Supplemental figure showing the distribution of fluorescently labeled ghrelin in the OB, including the GL, EPL, GCL and RMS. This comprehensive view highlights ghrelin localization relative to vasculature and migrating neurons in the regions analyzed in Figure 1.

      (1) Figure 1, panel I is presented in a confusing manner. High blood flow points to 0 degrees, low blood flow to 180 degrees. It implies (unintentionally, I am sure) that low blood flow results in migration away from OB. Maybe plot separately?

      We agree that the original presentation of Figure 1I could be misinterpreted as referring to anatomical orientation (i.e., toward or away from the OB). To avoid confusion, we revised the figure to categorize new neuron–vessel interactions into four groups according to (1) the angle between the migration direction and vessel axis (small or large), and (2) whether the new neuron is migrating toward or away from the direction of higher blood flow. This new presentation avoids implying a fixed anatomical direction and better reflects the relationship between local blood flow and neuronal migration behavior. The revised figure is presented as Supplemental Figure S1.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1(Public review):

      We deeply appreciate the reviewer comments on our manuscript. Following up the revisions, our manuscript has been improved thanks to their insightful remarks. We have proceeded with all the required changes.  

      Weaknesses:

      The authors have still not addressed the inconsistent/missing description for sample size, the appropriate number of * for each figure panel, and the statistical tests used.

      Description of sample size, specific P value and statistical test used has been added it both in the main text, figures and figure legends.

      The authors assign 5% oxygen as hypoxia. This is not the case as the in vivo environment is close to this value. 5% is normoxia. Clinical IVF/embryo culture occurs at 5% O2. Please adjust your narrative around this.

      We define in our manuscript “normoxia” as the standard atmospheric oxygen levels in tissue culture incubators, which range from about 20–21% oxygen. Our definition of hypoxia is 5% concentration of oxygen, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8% (Alva et al 2022). Considering that these levels of oxygen are the standard levels in tissue culture practices, a paragraph has been added to the discussion and materials and methods for further clarification   

      Reviewer #2 (Public review):

      Weakness:

      Given that this is a study on the induction of aneuploidy, it would be meaningful to assess aneuploidy immediately after induction, and then again before implantation. This is also applicable to the competition experiments on page 7/8. What is shown is the competitiveness of treated cells. Because the publication centers around aneuploidy, inclusion of such data in the main figure at all relevant points would strengthen it. There is some evaluation of karyotypes only in the supplemental - why? Would be good not to rely on a single assay that the authors appear to not give much importance.

      This is an excellent point. However, due to the stochasticity of the arising of aneuploidies when embryos are treated with AZ3146 and reversine (Bolton et al 2016), every treatment is likely to generate different levels of aneuploidy. Due to this, and to the technical limitations of generating single-cell genomic DNA sequencing at the blastocyst stage, we were unable to determine the karyotype of all cells after different conditions. Nevertheless, Regin et al 2024 (eLife) showed similar results on the overall transcriptome changes of different dosages of aneuploidy: high dosage embryos overexpress p53, like reversine-treated embryos; meanwhile, low dosage embryos overexpress the hypoxic pathway, including HIF1A, similar to embryos treated with AZ3146.  

      Reviewer #1 (Recommendations for the authors):

      Corrections required before final publishing:

      Please ensure that the number of asterisks is in alignment with standard convention (* <0.05; ** <0.01; *** <0.001; **** <0.0001). If you want to describe an exact P -vale it should be presented as P = 0.0004. line 108 *** is <0.0004. line 263 * P<0.0044

      Same issue appears in lines 697, 711, 722, 753, 685

      Specific values have been added in the figures and modified in the text. 

      Line 199: "...viable E9.5 embryos" missing "Figure S1D"

      Modified in manuscript

      Line 120: "...decidua" please add "Figure S1C"

      Modified in manuscript

      Line 126-127: Please add a description for the results (morula) in Fig 1D, e.g., It appears that YH2Ax persists from 8-cell to morula when treated with Reversine but not AZ3146"

      At the morula stage, the levels of γH2A.X in reversine- and AZ3146-treated embryos are similar (Fig. 1E). However, at the blastocyst stage, high levels of γH2A.X are maintained in reversine-treated embryos and reduced in AZ3146-treated embryos, suggesting some level of DNA repair between the morula-to-blastocyst stages (Fig. S2A). In contrast, in hypoxia, the levels of γH2A.X are low in the three treatments at the morula stages, suggesting that DNA repair can be enhanced under hypoxic conditions. Similar results have been reported in somatic cells (Marti et al., 2021; Pietrzak et al., 2018).

      Line 213: PARP1 levels were also similar under all conditions; but Fig3E, top right shows PARP1 was significantly lower with Reversine treatment; also please correct me if i am wrong, but does the phrase "all conditions" cross reference yH2AX and PARP1 between Fig 3 and Fig 1 to show the impact of hypoxia? Because from my understanding Fig 1 was done in 20% oxygen, but Fig 3 was done in 5% oxygen – hypoxia.

      This is correct. Modification in the manuscript has been performed for clarification

      Line 264: extra forward dash? "Reversine/AZ3146/ aggregation"

      Modified in manuscript

      Line 644: you don't have a control for IDF treatment, so how did you differentiate between impact of aneuploid drugs vs IDF treatment alone? Would the impact observed be due to compounding effect of aneuploidy drugs + IDF?

      This is a great observation. We previously demonstrated that IDF-1174 treatments in embryos do not affect pre-implantation development (Fig. S3).

      Line 681: change their behaviour is a vague statement. Be specific.

      Modified in manuscript

      Line 676 missing bracket "E)"

      Modified in manuscript

      Line 680: "...significantly on" should be "for"

      Modified in manuscript

      Line 682-685: "...hypoxia favours the survival of reversine-induced aneuploid cells." does it? the statement before this says in Rev/AZ chimeras, AZ blastomeres contribute similarly to reversine-blastomeres to the TE and PE but significantly increase contributions to the EPI.Wouldn't this mean hypoxia favours survival of AZ aneuploid cells in EPI?

      In normoxic conditions, AZ3146 treated cells in Rev/AZ chimeras contributed mostly to the EPI and TE but not PE. In contrast, in normoxic conditions, Rev-treated cells contributed similarly to all the lineages. This result seems to be due to a better survival of Rev-treated cells under normoxic conditions (Fig. 4D-E)

      Line 720: (b) shows blastocyst staining from what group? DMSO? Rev/AZ? Or are the 3 blastocysts shown here, 3 separate examples of Reversine-treated blastocysts? Would require labelling Fig S2B, and adding a short description in the corresponding figure legend

      Figure (B) shows the expression pattern of PARP1 at the blastocyst stage. Modified in manuscript

      Figure 2, Figure S3 and Figure S6: were these experiments performed at 5% or 20% O2, please add detail.

      Modified in manuscript

      Reviewer #2 (Recommendations for the authors):

      Lines 45-46 understanding of reduction of aneuploidy should mention/discuss the paper of attrition/selection, of the kind by the Brivanlou lab for instance, or others. As well as allocation to specific lineages, including the authors' work.

      A section in the discussion has been added in response to this recommendation. Comparison between models is debatable.

      The response does not clarify whether other papers were cited instead, or the authors own work that has shown preferential allocation to TE.

    1. Author response:

      There was a common theme across the reviews to provide a more cautious interpretation and to consider the key question of whether peer reviewers who include citations are being purely self-serving or are highlighting important missing context. I will include a suggested new text analysis to cover this and will expand the discussion on this key question. Reviewers highlighted some confusion around the sample sizes for the different analyses, and I will clarify all sample sizes in the next version.

    1. Author response:

      Reviewer #1 (Public Review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression. The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours. The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      Thank you for your valuable suggestion. In response to the reviewer’s request for further clarification on “how previously published single-nuclei data were assigned to the newly defined C1-C3 subtypes,” we have provided additional methodological details in the revised manuscript (lines 103-109). Specifically, we aggregated the single-nucleus RNA-seq data to the sample level by summing gene counts across nuclei to generate pseudo-bulk expression profiles. These profiles were then normalized for library size, log-transformed (log1p), and z-scaled across samples. Using genesets scores derived from our earlier WGCNA analysis of PPGLs, we defined transcriptional subtypes within the Magnus cohort (Supplementary Figure. 1C). We further analyzed the single-nucleus data by classifying malignant (chromaffin) nuclei as C1, C2, or C3 based on their subtype scores, while non-malignant nuclei (including immune, stromal, endothelial, and others) were annotated using canonical cell-type markers (Figure. 4A).

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

      Thank you for your valuable suggestion. In the revised manuscript (lines 74-79), Figure. 1A, Table S1 and Supplementary Figure. 1A, we harmonized anatomic site annotations from our PPGL cohort and the TCGA cohort and analyzed the distribution of tumor origin (adrenal vs extra-adrenal) across subtypes. The site composition is essentially uniform across C1-C3—approximately 75% pheochromocytoma (PC) and 25% paraganglioma (PG)—with only minimal variation. Notably, the proportion of extra-adrenal origin (paraganglioma origin) is slightly higher in the C1 subtype (see Supplementary Figure 1A), which aligns with the biological characteristics of tumors from this anatomical site, which typically exhibit more aggressive behavior.

      Reviewer #2 (Public Review):

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods. The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNA-seq on all PPGL samples in clinical practice, some potential proxies are proposed.

      We sincerely thank the reviewer for their positive assessment of our study’s rationale. We fully agree that RNA sequencing for all PPGL samples remains resource-intensive in current clinical practice, and its widespread application still faces feasibility challenges. It is precisely for this reason that, after defining transcriptional subtypes, we further focused on identifying and validating practical molecular markers and exploring their detectability at the protein level.

      In this study, we validated key markers such as ANGPT2, PCSK1N, and GPX3 using immunohistochemistry (IHC), demonstrating their ability to effectively distinguish among molecular subtypes (see Figure. 5). This provides a potential tool for the clinical translation of transcriptional subtyping, similar to the transcription factor-based subtyping in small cell lung cancer where IHC enables low-cost and rapid molecular classification.

      It should be noted that the subtyping performance of these markers has so far been preliminarily validated only in our internal cohort of 87 PPGL samples. We agree with the reviewer that larger-scale, multi-center prospective studies are needed in the future to further establish the reliability and prognostic value of these markers in clinical practice.

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      We agree with your comment regarding the need to further evaluate the performance of proxy markers for transcriptional subtyping. In our study, we have in fact taken this point into full consideration. To translate the transcriptional subtypes into a clinically applicable classification tool, we employed a linear regression model to compare the effect values (β values) of candidate marker genes across subtypes (Supplementary Figure. 1D-F). Genes with the most significant β values and statistical differences were selected as representative markers for each subtype.

      Ultimately, we identified ANGPT2, PCSK1N, and GPX3—each significantly overexpressed in subtypes C1, C2, and C3, respectively, and exhibiting the most pronounced β values—as robust marker genes for these subtypes (Figure. 5A and Supplementary Figure. 1D-F). These results support the utility of these markers in subtype classification and have been thoroughly validated in our analysis. 

      There is limited prognostic information available.

      Thank you for your valuable suggestion. In this exploratory revision, we present the available prognostic signal in Figure. 5C. Given the current event numbers and follow-up time, we intentionally limited inference. We are continuing longitudinal follow-up of the PPGL cohort and will periodically update and report mature time-to-event analyses in subsequent work.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. Author response:

      Response to Comments from reviewer #1

      Many thanks for appreciating that SZN-043 can promote hepatocyte proliferation via the Wnt-signaling pathway.

      (1) The reviewer is concerned with using only CYP1A2 expression as an endpoint to make a conclusion about the effect of SZN-043 on Wnt activity in human ALD samples. The reviewer raises a good point as the more commonly used Wnt target gene, AXIN2, is not consistantly changed in both cohorts. We were at first also surprised by this finding. However, upon closer analysis we found that the expression of hepatocyte-specific target genes such as CYP1A2 (Figure 2), CYP2E1, OAT, LGR5, GLUL (Table 1) and ZNRF3 were mostly expressed in hepatocytes and ductal cells were all down-regulated in ALD samples. Others Wnt target genes expressed in epithelial and mesenchymal liver cell populations, such as AXIN2, CCND1 and NOTUM are indeed not consistently and significantly changed. Given that SZN-043 is not active on mesenchymal cells, this discrepancy could be best explained by the large increase in mesenchymal cells in ALD tissue samples, thereby confounding the results. We have now clarified this in the discussion. Another method to assess Wnt activity is to measure b-catenin phosphorylation and nuclear transfer. In our hands, this method was found to be better suited for tissue culture than histological sections from in vivo studies. We have also amended the manuscript title to refer to expression of Wnt target genes, rather than Wnt activity.

      (2) We have now added a supplemental figure to show the lack of Ki-67+ human hepatocytes in the cirrhotic tissue samples to confirm the absence of hepatocyte proliferation (Figure S1).

      (3) The differences in amino acid sequence between SZN-043 and its precursor, αASGR1-RSPO2-RAIgG, can be found in the material and method section. These changes in amino acid sequences improved the biophysical properties of the final clinical candidate, such as oxidation and nonspecific binding. The biochemical analysis of those differences exceeds the scope of the current manuscript. We present here the pharmacokinetic properties of SZN-043 only, as this was the only molecule advanced to clinical trial and used in the studies presented here.

      (4) The reviewer suggests to assess the effect of SZN-043 in Ctnnb1-KO mice to confirm that SZN043 acts via a canonical Wnt pathway. Indeed, there were several reports on the ability of Rspondin to act on other pathways besides the Wnt signaling pathway (for recent review, Niehrs et al, 2024, Bioessays). However, while an interesting suggestion, this line of investigation belongs to MOA studies and exceeds the scope of the current manuscript. An additional manuscript presenting MOA studies for SZN-043 was recently submitted elsewhere. Still, we have added this possibility in the discussion section.

      (5) The reviewer is asking how SZN-043 is affecting liver functions in general. Indeed, we have observed a consistent reduction in the international normalized ratio of prothrombin time using the thioacetamide (TAA)-induced fibrosis model and previously published those findings (Zhang, 2020). In our hands, the TAA is the only liver injury model that significantly increases INR. This increase is modest compared to that observed in clinical patients. Therefore, we do not report INR findings for other models. We have not seen any effects of SZN-043 on hepatocyte differentiation markers such as HNF4A (data not shown) and the hepatocyte specific ASGR1/2 as shown in Figure 5. Rather we focused on proliferation as the main potentially beneficial endpoint, to restore the parenchymal mass in injured livers. Finally, consistent with what was reported in the literature, we have observed a transient and reciprocal effect on albumin and alfa-fetoprotein expression during the proliferative phase of liver regeneration. These results are detailed in an additional manuscript presenting MOA studies for SZN-043, which was recently submitted elsewhere.

      (6) We have used females only in the ethanol-induced injury models because there are numerous reports in the literature stating that males are not as susceptible to those injuries.  

      (7) The reviewer questions the relevance of the ethanol-induced injury model used to evaluate SZN043 efficacy. Indeed, none of the disease model developed to date reproduce the severity and complexity of alcohol-associated liver diseases, although some, such as the ethanol supplemented Lieber DeCarli diet, are more commonly used than others – which is the reason why this model was selected. 

      (8) The reviewer questions the relevance of the fibrosis model used to evaluate SZN-043 efficacy. Indeed, none of the fibrosis models developed to date reproduce the severity and complexity of cirrhosis in human livers. While combining ethanol with CCl4 would lead to more severe fibrotic livers, CCl4 itself is not involved in ALD in humans. Both models are likely to result in similar pericentral fibrosis with central-to-central bridging. In this study, we were mostly interested in addressing the effects of SZN-043 in a tissue affected by fibrotic scars.  

      (9) The sex of CCl4-treated mice is male. We added this information in the methods section.

      (10) A summary of histology and fibrosis assessment data for alcohol-fed mice was added in supplemental Table S3. In our hands, the use of aging mice did not induce the presence of fibrosis, in contrast to published results.  

      (11) The rationale for using 13.5-month-old mice in the alcohol studies and scid mice in the CCl4 studies has been clarified in the results and discussion sections. 

      a. Briefly, aging mice were reported to be more susceptible to ethanol-induced injury than young mice and to include induction of fibrosis. However, we were unable to reproduce the presence of fibrosis reported in the literature.  

      b. Scid mice were used in the CCl4 studies to test whether a stronger response could be observed in the absence of a potential anti-drug antibodies response. While a modest reduction in fibrosis was observed in both B6 and scid mice following the SZN-043 treatment, the effect size did not seem affected by the mouse strain. 

      Response to Comments from reviewer #2

      Many thanks for appreciating that the use of multiple disease models to identify SZN-043 as a potential novel drug for liver regeneration.

      (1) The importance of restoring liver regeneration capacity to reduce the need for liver transplantation had been emphasized in the introduction.

      (2) There is continuous damage to the mouse hepatocytes in the FRG mice, due to the Fah mutation. They undergo repair mechanisms favoring the proliferation of human hepatocytes during the production period. Injury models that affect the human hepatocytes population have been developed in these mice. However, the primary goal of this study was to confirm that SZN043 was efficacious in inducing human hepatocytes proliferation, a feature difficult to reproduce in primary hepatocyte cultures. Given the artefactual nature of the chimeric liver in FRG mice and the high cost of these mice, further studies were not judged to be necessary.

      (3) Corrected

      (4) A figure including DAPI staining has now been included in supplemental Figure S2.

      (5) Clarification that the 8 weeks alcohol feeding used in our study design is a modification of the NIAAA model. While some ASGR1 has been reported on the surface of macrophages, additional data from MOA studies strongly suggest that the effect of SZN-043 is mediated via a hepatocytespecific mechanism (submitted manuscript).

      (6) The reviewer inquired about the potential role of macrophages in promoting an antiinflammatory state in response to SZN-043. While a direct effect is unlikely, a potential effect of macrophages in response to SZN-043 is plausible. Wnt activation is known to induce the secretion of hepatokines, such as LECT2, which in turn can influence macrophage activity. This possibility is discussed in the discussion section.

      (7) The potential off-target effects of SZN-043 such as stellate cell activation is discussed in the discussion section.

      (8) The discussion of the limitations of current models has been included in the discussion section of the manuscript.

      (9) We have now included a discussion of prior RSPO-based therapies, such as OMP-131R10. We explain why the hepatocyte-targeting of RSPO activity minimizes undesired effects.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, and we hace provided a more detailed introduction in the introduction section, in lines 60-71 of the manuscript.

      We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      In addition, to validate the results at the protein level, we have recently conducted some experiments. However, as several proteins are currently at a critical stage of further experimental validation, it is not appropriate to present them in the manuscript at this time. Instead, we have uploaded the relevant data as an appendix for your review. This includes a figure of several protein markers we examined, as well as a table of the antibodies used.

      This section is also further elaborated in the introduction and its references.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      This section is also further elaborated in the introduction and its references.

      Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      Considering the overall data layout and the length of the article, we ultimately decided not to make any changes to the presentation of the article's data. The images included in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view any data they are interested in.

      Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 3-7: Label treatment groups (CON vs. EA) consistently in legends.

      (2) Methods: Specify rat strain (Sprague-Dawley) in the abstract.

      (3) Clarify Limitations: Explicitly state that BBB opening is inferred, not proven.

      This section has been revised at lines 743-733, 748, 949, 754-755, and 759-760 of the manuscript.

      Revised at line 31 of the manuscript.

      Thank you for your feedback. The background information on the open evidence of BBB has been added to the introduction.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract and Introduction

      • Include specific key findings in the abstract to improve clarity and reader engagement.

      • Expand the introduction to situate this work in the context of other BBB-opening methods (e.g., ultrasound) and the known consequences of BBB disruption.

      • Clarify the rationale for choosing electroacupuncture.

      • Include information (perhaps summarized from previous studies) about the extent, timeline, and functional assessment of BBB opening in this model to help justify the single-cell RNA-seq design.

      (2) Experimental Rationale and Context

      • Reiterate experimental design and rationale in each results section, rather than relying exclusively on the Methods section.

      • Specify the time point of tissue collection relative to the EA intervention.

      • Describe the anatomical sites of acupuncture stimulation and their physiological relevance.

      (3) Data Presentation

      • Replace the human brain cartoon in Figure 1 with an anatomically appropriate rat brain schematic.

      • Reevaluate which data are presented in the main versus supplementary figures. Highlight biologically meaningful results, such as cell-cell communication and ligand-receptor interactions, in the main figures rather than supplementary data.

      (4) Interpretation and Modeling

      • More carefully link transcriptional changes (e.g., Wnt signaling in microglia) to biologically plausible mechanisms of BBB regulation-e.g., microglial signaling to endothelial cells.

      • Clarify whether the presence of granulocytes and T cells might result from a lack of perfusion prior to brain dissection.

      • Consider proposing a model (even speculative) of how EA leads to BBB opening based on observed transcriptional changes.

      First, for the sake of brevity in the abstract, we did not present specific results in this section. Second, since BBB opening via EA is a unique strategy, our previous studies have examined the opening time window and the recovery of the BBB after EA intervention (as mentioned in the introduction). We believe its characteristics differ from those of ultrasound-induced BBB opening and BBB disruption, so we did not conduct comparative discussions, but objectively presented our research findings. In further functional validation experiments, we may consider integrating other opening strategies in our studies. Additionally, the choice of electroacupuncture was based on our previous series of studies, which have already been outlined in the research background. Finally, we did indeed determine the experimental design of this study based on prior research, as described in the background section of the introduction.

      We decided not to make changes to this section in the manuscript after careful consideration. The setup of electroacupuncture intervention and controls has been thoroughly discussed in our previous studies (as referenced in the introduction), so we have not repeated it in this manuscript. Overall, building on all our previous findings, this study focuses primarily on the potential mechanisms of EA intervention. The anatomical sites of acupuncture stimulation and their physiological relevance are another key area of our research, and we are currently conducting a series of related studies. We look forward to sharing these findings with you in the future.

      We have already changed the human brain diagram in Figure 1 to a rat brain diagram, and have replaced Figure 1 in the files with the revised version. However, considering the overall data layout and the length of the article, we ultimately decided not to make changes to the data presentation in the manuscript. The images in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view the data they are interested in.

      This section has provided us with excellent suggestions for further exploration, although no changes have been made to the manuscript at this time. In the future, we may conduct more detailed transcriptomic studies focusing on sex differences and different brain regions, which will allow for a more comprehensive analysis of the biological mechanisms involved in BBB regulation.

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.

      Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).

      In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.

      Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).

      Strengths: 

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis. 

      Weaknesses: 

      (1) Some results lack quantification. 

      We will consider all the data and add appropriate quantifications where necessary.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      Please see our comments above.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility. 

      Strengths: 

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo. 

      Weaknesses:

      The MS is robust and there are not big weaknesses 

      Reviewer #3 (Public review): 

      Summary: 

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally. 

      Strengths: 

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function. 

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis. 

      We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.

      This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.

      We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.  

      (1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).

      (2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.

      (3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.

      (4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.  

      Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim. 

      Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype? 

      These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.

      (5)The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation. 

      The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis

      Reference

      Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.

      Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.

      Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.

      Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi:10.1016/s0092-8674(03)01064-x. PMID: 14718166.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      Strengths:

      The antibiotic resistance crisis requires the development of new solutions to treat infections caused by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report, the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuable data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for the characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

      Weaknesses:

      Although the information presented in this manuscript is important, previous reports regarding other T3SS structures complexed with antibodies, reduce the novelty of this report. Nevertheless, we provide several comments that may help to improve the report. The structural analysis of the presented mAbs is incomplete and unfortunately, the authors did not address any developability assessment. With such vital information missing, it is unclear if the proposed antibodies are suited for diagnostic or therapeutic usage. This vastly reduces the importance of the possibly great potential of the authors' findings. Moreover, the structural information does not include the interacting regions on the mAb which may impede the optimization of the mAb if it is required to improve its affinity.

      As described in the manuscript (Fig. 6), our mAbs are markedly less effective in every in vitro T3SS inhibition assay than the mAbs recently described by Simonis et al. They are therefore very unlikely to outperform these mAbs in in vivo animal models of P. aeruginosa infection. Considering the high cost of animal experiments and ethical concerns-and in accordance with the Reduction principal of the 3Rs guidelines-we chose not to pursue in vivo experiments. Instead, we focused on leveraging the new isolated mAbs to investigate the mechanisms of action and structural features of anti-PcrV mAbs.

      Following the reviewer's suggestion, we have now added mAb interaction features into the structural data presented in the manuscript. However, based on the efficiency data, the structural analysis and the mechanistic insights presented, we do not consider further therapeutic use and optimization of our mAbs to be warranted.

      Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). The authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      (1) The article is well written.

      (2) The authors used complementary assays to evaluate the protective effect of candidate monoclonal antibodies.

      (3) The authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      (4) The authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      The authors used a similar workflow to the one previously reported in Simonis et al. 2023 (antibodies from cystic fibrosis patients that included B cell isolation, antibody-PcrV interaction modeling, etc.) but the authors do not clearly explain how their work and findings differentiate from previous work.   

      We employed a similar mAb isolation pipeline to that used by Simonis et al., beginning with the screening of a cohort of cystic fibrosis patients chronically infected with P. aeruginosa. As in Simonis et al., we isolated specific B cells using a recombinant PcrV bait, followed by single-cell PCR amplification of immunoglobulin genes. The main differences in methodology between the two studies are as follows: i) the use of individuals from different cohorts, and therefore having different Ab repertoires; ii) the nature of the screening assays, although in both cases the screening was focused on the inhibition of T3SS function; iii) the PcrV labeling strategy, with Simonis et al. employing direct labeling, whereas we used a biotinylated tag combined with streptavidin;

      The number of specific mAbs obtained and produced was higher in Simonis et al. (47 versus 9 in our study). They sorted B cells from three individuals compared to two in our work and possibly started with a larger amount of PBMCs per donor, which may account for the higher number of specific B cells and mAbs isolated. Considering that the strategies were overall very similar, the greater number of mAbs isolated in Simonis et al. likely explains, to a large extent, why they identified mAbs targeting different epitopes compared to ours, including highly potent mAbs that we did not recover. 

      Our modeling study, unlike that of Simonis et al., which relied on an AlphaFold prediction of the multimeric structure of P. aeruginosa PcrV, was based on the experimentally determined structure of the homologous Salmonella SipD pentamer, as described in the manuscript. Furthermore, we compared our mAb P3D6 not only with 30-B8 from Simonis et al., but also with MEDI3902. Finally, in contrast to the approach of Simonis et al., we used functional assays to investigate the differences in mechanisms of action among these mAbs, which target three distinct epitopes.

      (2) Although new antibodies against P. aeruginosa T3SS expand the potential space of antibodybased therapies, it is unclear if P3D6 or P5B3 are better than previous antibodies. In fact, in the discussion section authors suggested that the 30-B8 antibody seems to be the most effective of the tested antibodies.  

      As explained above and shown in the Results section (Figure 6), the 30-B8 mAb is markedly more effective at inhibiting T3SS activity in both in vitro assays used.

      (3) The authors should explain better which of the two antibodies they have discovered would be better suited for follow-up studies. It is confusing that the authors focused the last sections of the manuscript on P3D6 despite P3D6 having a much lower ExoS-Bla inhibition effect than P5B3 and the limitation in the PcrV variant that P3D6 seems to recognize. A better description of this comparison and the criteria to select among candidate antibodies would help readers identify the main messages of the paper. 

      The P3D6 mAb shows stronger inhibitory activity than P5B3 in the two assays used, as shown in Supplementary Figure 1. An error in the table in Figure 2B was corrected and this table now reflects the results presented in Supplementary Figure 1. 

      The final sections of the manuscript focus on P3D6, which is more potent than P5B3, and for which we successfully determined a co-crystal structure with PcrV*. All parallel attempts to obtain a structure of P5B3 in complex with PcrV* failed. The P3D6-PcrV* structure was used to analyze epitope recognition and mechanisms of action in comparison to previously described mAbs. As previously mentioned, we do not consider further studies aimed at therapeutic development and optimization of our mAbs to be justified given the current data. Therefore, we believe that the main message of the paper is adequately captured in the title.

      (4) This work could strongly benefit from two additional experiments:

      (a) In vivo experiments: experiments in animal models could offer a more comprehensive picture of the potential of the identified monoclonal antibodies. Additionally, this could help to answer a naïve question: why do the patients that have the antibodies still have chronic P. aeruginosa infections? 

      As explained above, the mAbs we isolated are significantly less potent than those described by Simonis et al., and are therefore unlikely to outperform the best anti-PcrV candidates in vivo. In light of the data, and considering ethical concerns related to animal use in research and budgetary constraints, we decided not to proceed with in vivo experiments.

      There are a number of reasons that may explain why patients with anti-PcrV Abs blocking the T3SS can still be chronically infected with Pa. First these Abs may be at limiting concentration, particularly in sites where Pa replicates, and thus unable to clear infection. in addition, it has been described that the T3SS is downregulated in chronic infection in cystic fibrosis patients. This suggests that a therapeutic intervention with T3SS inhibiting Abs may be more efficient if done early in cystic fibrosis patients to prevent colonization when Pa possesses an active T3SS. Finally, T3SS is not the only virulence mechanism employed by P. aeruginosa during infection. Indeed, multiple protein adhesins and polysaccharides are important factors facilitating the formation of bacterial biofilms that are crucial for establishing chronic persistent infection. In this regard, a combination of Abs targeting different factors on the P. aeruginosa surface may be needed to treat chronic infections.  

      (b) Multi-antibody T3SS assays (i.e., a combination of two or more monoclonal antibodies evaluated with the same assays used for characterization of single ones). This could explore the synergistic effects of combinatorial therapies that could address some of the limitations of individual antibodies. 

      Given the high potency of the Simonis mAbs and the mechanisms of action highlighted by our analysis, it is unlikely that our mAbs would synergize with those described by Simonis. Additionally, since our two mAbs cross-compete for binding, synergy between them is also improbable.

      Reviewer #1 (Recommendations for the authors):

      Line 166: How was the serum-IgG purified? (e.g., protein A, protein G). 

      Protein A purification was used, as now mentioned in the manuscript. Purified Igs were thus predominantly IgG1, IgG2 and IgG4, as indicated.

      (2) Line 196: When mentioning affinities, it is preferable to present in molar units. 

      To facilitate comparisons, Ab concentrations were presented in µg/mL as in Simonis et al.

      (3) Line 206: The author states that P3D6 displays significantly reduced ExoS-Bla injection (Figure 2B), but according to the presented table, ExoS-Bla inhibition was higher for P5B3. Additionally, when using "significantly", what was the statistical test that was used to evaluate the significance? Please clarify.

      We thank the reviewer for pointing out this inconsistency. Indeed, the names of P3D6 and P5B3 were exchanged when building the table related to Figure 2B. The corrected version of this figure is now presented in the new version of the manuscript. An ANOVA was performed to evaluate the significance of the observed difference (adjusted p-values < 0.001) and it is now mentioned in the figure caption.  

      (4) Line 215: "P3B3" typo.

      This was corrected.

      (5) Figure 3B: Could the author explain the higher level of ExoS-Bla injection when using VRCO1 antibody compared to no antibody.  

      A slightly higher level of the median is observed in the case of three variants out of five. However, this difference is not statistically significant (p-value > 0.05).

      (6) Supplement Figure 1: the presented grey area is not clear (is it the 95%CI?) and how was the IC50 calculated? With what model was it projected? Are the values for IC50 beyond the 100µg/mL mark a projection? It seems that projecting such greater values (such as the IC50 of over 400µg/mL for variant 5) is prone to high error probability.

      The grey area represents the 95% confidence interval (95% CI) and it is now mentioned in the figure caption. The IC50 and 95% CI were both inferred by the dose-response drc R package based on a three-parameters log-logistic model and it is now explained in the Materials & Methods section. The p-values for IC50 beyond the 100µg/mL were below 0.05 but we agree that such extrapolation should be considered with precaution (see below our response to comment number 7).

      (7) Line 227: The author describes that P5B3 has similar IC50 values towards variants 1-4, but the  IC50 towards variant 5 is substantially higher with 400µg/mL, albeit the only difference between variant 4 and 5 is the switch position 225 Arg -> Lys which are very similar in their properties. Please provide an explanation. 

      As explained in our response to comment number 6, we agree that the comparison of IC50 that are estimated to be close or higher than the highest experimental concentration is somehow speculative. Indeed, we performed further statistical analysis that showed no significant difference between the IC50 toward the five PcrV variants of mAb P5B3. In contrast, the difference between the IC50 of mAbs P5B3 and P3D6 toward variant 1 is statistically significant. This is now explained in the manuscript.

      (8) Line 233: Pore assembly: It is not clear how the data was normalized. The authors mention the methods normalization against the wildtype strain in the absence of antibodies, but did not elaborate clearly if the mutant strain has the same base cytotoxicity as the wild type. It would be helpful to show the level of cytotoxicity of the wild type compared to the mutant in the absence of antibodies to understand the baseline of cytotoxicity of both strains.  

      In these experiments we did not use the wild-type strain. As explained, the only strain that allows the measurement of pore formation by translocators PopB/PopD is the one lacking all effectors. All the experiments were done with this strain, and all the measurements were normalized accordingly. 

      (9) Figure 4: The explanation is redundant as it is clearly stated in the results. It would be better for the caption to describe the figure and leave interpretation to the results section. Overall, this comment is relevant to all figure captions, as it will reduce redundancy. My suggestion is to keep the figure caption as a road map to understand what is shown in the figure. For example, the Figure 4 caption should include that the concentration is presented in logarithmic scale, what is the dashed line, what is the grey area (what interval does it represent?), what each circle represents, and what is the regression model used? 

      Figure captions have been improved as suggested. 

      (10) Line 432: The authors apparently misquoted the original article describing the chimeric form PcrV* by describing the fusion of amino acids 1-17 and 136-249. I quote the original article by Tabor et al. "[...] we generated a truncated PcrV fragment (PcrVfrag) comprising PcrV amino acids 1-17 fused to amino acids 149-236 [...]". Additionally, how does the absence of amino acid 21 in the variant affect the conclusion? 

      Our construct was inspired by the one described in Tabor et al. but was not identical. We have therefore replaced "was constructed based on a construct by Tabor et al." for "whose design was inspired by the construct described in Tabor et al."

      Amino acid 21 is only absent in the construct used for crystallization experiments; all other experiments looking at Ab activity were performed with bacteria bearing full-length PcrV. The difference in P3D6 activity between variants V1 and V2-appears to be explained by the nature of the residue at position 225, according to the structural data, as explained now in more detail in the manuscript. Accordingly, the difference in efficiency of P3D6 against the V1 and V2  variants is explained by the residue at position 225, as both variants have the same residue at position 21. However, while the nature of the residue at position 225 appears to explain the absence of efficiency of the Ab for the variants studied, an impact of residue 21 could not be totally ruled out in putative variants with a Ser at 225 but different amino acids at 21.

      (11) Line 569: Missing word - ESRF stands for European Synchrotron Radiation Facility. 

      This has been corrected.

      (12) Line 268-269 (Figure 5A): The description of the alpha helices in relation to the figure is incomplete. Helices 2,3 and 5 are not indicated. 

      Indeed, since the structure is well-known and in the interest of visibility and simplicity, we only included the most relevant secondary structure features.

      (13) Line 271-272: It would be good to elaborate on the exact binding platform between LC and HC of the Fab and the residues on the PcrV side. For example, the author could apply the structure to PDBePISA (EMBL-EBI) which will provide details about the interface between the PcrV and the antibody. It is very interesting to learn what regions of the antibody are in charge of the binding, such as: is the H-CDR3 the major contributor of the binding or are other CDRs more involved? Additionally, in line 275 they state that the substitution of Ser 225 with Arg or Lys is consistent with the P3D6 insufficient binding. What contributed to this result on the antibodies side? 

      In order to address this question, we are now providing a LigPlot figure (supplementary Figure 3) in which specific interactions between PcrV* and the Fab are shown.

      (14) Line 291: It is unclear from what data the authors concluded that anti-PscF targets 3 distinct regions of PscF. 

      The data are shown in Supplementary Table 2, as mentioned in the manuscript. We have now modified the order of the anti-PcrV mAbs in the table to better illustrate the three identified epitope clusters (Sup table 2). Similarly, the anti-PscF mAbs appear to group into three clusters as P3G9 and P5E10 only compete with themselves, while mabs P3D6 and P5B3 compete with themselves and each other.

      (15) Line 315: It is preferable to introduce results in the results section instead of the discussion. 

      While preparing the manuscript, we initially included these results as a separate paragraph in the Results section, but ultimately chose the current format to improve flow and avoid redundancy.

      (16) Supplement Figure 2: What was the regression model used to evaluate IC50, and what is presented in the graph? What is the dashed line (see comment for Figure 4 above)? 

      The regression is based on a three-parameters log-logistic model and the light-colors area correspond to the 95% IC. The dashed lines visually represents 100% of ExoS-Bla injection. These information are now mentioned in the figure caption.

      (17) Figure 6B: It would be better to show an additional rotation of the PcrV bound by Fab 30-B8 that corresponds to the same as the one represented with Fab MEDI3092. This would clear up the differences in binding regions. Same for Fab P3D6. 

      Figure 6 already depicts two orientations. Despite the fact that we agree that additional orientations could be of interest, we believe that this would add unnecessary complexity to the figure, and would prefer to maintain the figure as is, if possible.

      (18) Line 356-358: The author proposes an experiment to support the suggested mechanism of P3D6, it would follow up with a bio-chemical analysis showing the prevention of PcrV oligomerization in its presence. 

      We understand the reviewers’ comment regarding the potential use of biochemical approaches to test our hypothesis. However, this not currently feasible as we have been unable to achieve in vitro oligomerization of PcrV alone, possibly due to the absence of other T3SS components, such as the polymerized PscF needle.

      (19) Line 456: Missing details about how the ELISA was conducted including temperature, how the antigen was absorbed, plate type, etc. 

      Experimental details have been added.

      (20) Line 460: Missing substrate used for alkaline phosphatase. 

      The nature of the substrate was added to the methods.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. Author response:

      Reviewer #1 (Public review):

      Cognitive Load and Task-Switching Components:

      We agree that cognitive load is multi-faceted and encompasses dimensions not fully captured in our present models, including domain and rule switching. For the revision, we will explicitly model these components in the statistical analyses by incorporating predictors reflecting attended domain switching and rule complexity, as suggested. We will also explain our inclusion of n-back reaction predictors and justify their relationship with theoretical constructs of executive function. Full details of coding schemes will be provided.

      Modeling Entropy and Surprisal:

      We appreciate the reviewer’s suggestion to further explain the distinction between entropy (predictive uncertainty) and surprisal (integration difficulty), and acknowledge that our treatment of entropy warrants extension. In the revision, we will expand the results and discussion on entropy, providing clearer theoretical motivation for its inclusion and conducting supplementary analyses to examine its role alongside surprisal.

      Replicability of Findings:

      We note the concern regarding two-way vs. three-way interactions in model replication. In the revised manuscript, we will report robustness analyses on subsets of our data (e.g., matched age and education groups), clarify degrees of freedom and group sizes, and transparently report any discrepancies.

      Predictors and Statistical Modeling:

      We will add clarifications on predictor selection, data structure, and rationale for model hierarchy. The functions of d-prime, comprehension accuracy, and performance modeling will be described in more detail, including discussion of block-level vs. participant-level effects.

      Reviewer #2 (Public review):

      Distinction Between Prediction and Predictability:

      We recognize the importance of clearly communicating the difference between prediction and predictability, as well as integration-based vs. prediction-based effects. We will clarify these distinctions throughout the introduction, methods, and discussion sections, citing the relevant theoretical literature (e.g., Pickering & Gambi 2018; Federmeier 2007; Staub 2015; Frisson 2017).

      Aging, Corpus Predictability, and Individual Differences:

      We appreciate the critical point regarding age, corpus-based predictability, and potential cohort effects in language model estimates. In the revision, we will provide conceptual clarifications on how surprisal and entropy might differ for different age groups and discuss limitations in extrapolating these metrics to participant-specific predictions. The limitations inherent in relying on LLM-derived estimates and text materials will be more directly addressed.

      Coverage of Literature and Paradigms:

      We will broaden the literature review as requested, particularly on the N400 effects and behavioral traditions in prediction research. These additions should help contextualize the present work within both neuroscience and psycholinguistics.

      Experimental Context and Predictability Metrics:

      We will address concerns regarding the context window for prediction estimation, describing more precisely how context was defined and whether broader textual cues may improve predictability metrics.

      References

      Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychol. Bull., 144(10), 1002–1044.

      Federmeier, K.D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505.

      Frisson, S. (2017). Can prediction explain the lexical processing advantage for short words? J. Mem. Lang., 95, 121–138.\

      Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Lang. Linguist. Compass, 9(8), 311–327.Huettig, F. & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Trends Cogn. Sci., 20(10), 484–492.We appreciate the reviewers’ constructive comments and believe their suggestions will meaningfully strengthen the paper. Our planned revisions will address each of the above points with additional analyses, clarifications, and expanded discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      Although this study is rigorous and the paper is well-written, I have a few concerns that the authors should address before publication.

      (1) Cellular levels of protein ADP-ribosylation should be analyzed using anti-ADPR antibodies following infection, both with and without Mac1 and AVI-4206 treatment. While the authors have provided impressive in vivo data, these experiments could ideally be conducted in mice. However, I would be amenable to these analyses being performed in human airway organoids, as they demonstrate clear phenotypes following AVI-4206 treatment post-infection. For a more in-depth exploration, the authors could consider affinity purifying ADP-ribosylated proteins and identifying them via mass spectrometry. I would find it particularly compelling if this approach revealed components of the NF-kB signaling pathway, given the intriguing results presented in Fig. 5. I am also curious if there are differences in ADP ribosylated proteins when comparing Mac1 KO SARS-C0V-2 to AVI-4206 treatment.

      We note that despite the recent flurry of activity around Mac1, there is a surprising lack of public data on overall ADPr levels or targets. While we will address the literature precedence for PARP14 signals specifically below (Reviewer 2 point (h)) by immunofluorescence, we note that overall levels have not been characterized biochemically previously. Recent PARP14 papers and the ASAP AViDD preprint show changes by immunofluorescence only: and the evidence in that preprint is quite modest - see Figure 7B - https://pmc.ncbi.nlm.nih.gov/articles/PMC11370477/.

      We suspect the difficulty in tracking changes biochemically is due to multiple factors that influence the overall detectability and reproducibility. First, with regard to detectability - it is quite possible that only a small change in the ADPr status of a small number of targets is responsible for the phenotypes in vivo. Virus levels are very low in the organoid system and the variability in ADPr levels from tissue samples from in vivo experiments is high. Given the difficulty in translating back to cellular models, this problem is therefore magnified further. Second, with regard to reproducibility - we observe a great deal of reagent dependence on ADPr signals by Western blot+/- Mac1 expression in both cellular and tissue lysates (including when stimulated with H2O2, interferon, or during viral infection). Similarly, we do not observe reproducible proteins that pulldown with Mac1 when assayed by mass spectrometry. It is quite likely that these issues are a result of tissue/sample preparation that results in a loss of the ADPr modification during preparation (especially for acidic residue modifications). This also explains the reliance on IF assays in the PARP14 literature. A very good discussion of these issues is also contained in this paper: https://doi.org/10.1042/BSR20240986.

      Nonetheless we have attempted one final experiment. Here, we have measured ADPr modification of cellular lysates upon uninfected conditions as well as upon infection with either WT or N40D mutant virus. For all conditions, this was done with or without treatment of cells with 100 μM of AVI-4206. Measurement of ADPr modifications by western blot using a  pan-ADPr antibody revealed a single prominent band with a molecular weight of ~130kDa, that showed a uniform increase in signal upon treatment of cells with AVI-4206 regardless of infection status. While this general trend was also observed with the mono-ADPr antibody, it was not statistically significant in its regulation upon AVI-4206 treatment. We suspect that the major band observed in these western blots is PARP1, as upon enrichment of ADPr proteins from these lysates by Af1521 immunoprecipitation, we find PARP1 to be among the most abundant proteins detected within this molecular weight range. We note that there is a baseline increase in polyADPr detection upon infection of virus with WT Mac1 (relative to uninfected and virus with N40D) and further increase when treated with AVI-4206. This compound-dependent increase is paralleled in the uninfected and N40D conditions. The counterintuitive increase upon WT Mac1 virus infection, which should erase ADPr marks, and the compound-dependent increase in the uninfected condition suggest that there are many indirect effects on ADPr signalling dynamics in this experiment. These results are difficult to reconcile with the specificity profiling of AVI-4206 (Supplementary Figure5: Thermal proteome profiling in A549 cellular lysates). As mentioned above, the lack of consistent signal across reagents for ADPr detection and the timing of monitoring ADPr levels are additional complicating factors.

      We added to the results:

      “However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8).”

      Methods for experiment:

      Calu3 cells were obtained from ATCC and cultured in Advanced DMEM (Gibco) supplemented with 2.5% FBS, 1x GlutaMax, and 1x Penicillin-Streptomycin at 37°C and 5% CO<sub>2</sub>. 5x10<sup>6</sup> cells were plated in 15-cm dishes and media was changed every 2-3 days until the cells were 80% confluent. The cells were treated with INFy 50 ng/mL (R&D Systems) w/without AVI-4206 100 μM. After 6 hours, the cells were infected with WA1 or WA1 NSP3 Mac1 N40D at a multiplicity of infection (MOI) of 1 for 36 hours. The cells were washed with PBS x 3 and scraped in Pierce IP Lysis Buffer (ThermoFisher) containing 1x HALT protease and phosphatase inhibitor mix (ThermoFisher) on ice. The lysate was stored at -80C until further processing.

      The cell lysate was incubated for 5 minutes at room temperature with recombinant benzonase. Following incubation, the lysate was centrifuged at 13,000 rpm at 4°C for 20 minutes, and the supernatant was collected. The samples were then boiled for 5 minutes at 95°C in 1x NuPAGE LDS sample buffer (Invitrogen) with a final concentration of 1X NuPAGE sample reducing agent (Invitrogen). For the detection of ADPr levels in whole-cell lysates, the samples were subjected to SDS-PAGE and Immunoblotting. All primary and secondary antibodies (pan-ADP-ribose antibody (MABE1016, Millipore), Mono-ADP-ribose antibody (AbD33204, Bio-Rad), HRP-conjugated (Cell signaling), used at a 1:1000 dilution were diluted in 5% non-fat dry milk in TBST. Signals were detected by chemiluminescence (Thermo) and visualized using the ChemiDoc XRS+ System (Bio-Rad). Densitometric analysis was performed using Image Lab (Bio-Rad). Quantification was normalized to Actin. The data are expressed as mean ± SD. Statistical differences were determined using an unpaired t-test in GraphPad Prism 10.3.1.

      (2) SARS-CoV-2 escape mutants for AVI-4206 should be generated, sequenced, and evaluated for both ADP-ribosyl hydrolase activity and their susceptibility to inhibition by AVI-4206.

      We thank the reviewer for this suggestion. These are indeed key experiments which are currently hampered by the lack of a cell line that is fully responsive to drug treatment. Although infected organoids and macrophages show an effect in response to AVI-4206, viral levels are ~3 logs lower than in cell lines and difficult to sequence. In the absence of a system that would allow meaningful screening for outgrowth of resistant viruses, we have conducted mass spectrometry studies that showed that Mac1 is the only significant hit for AVI-4206 (SupplementaryFigure 5). The suggested outgrowth experiments will be conducted once a responsive cell line model has been established.

      (3) Given that Mac1 is found in several coronaviruses, it would be insightful for the authors to test a selection of Mac1 homologs from divergent coronaviruses to assess whether AVI-4206 can inhibit their activity in vitro.

      As mentioned above, inconsistencies in ADPr staining limit our ability to directly measure cellular activity. As an alternative approach to measure AVI-4206 selectivity in cells, we have adapted our CETSA assay for SARS-1 and MERs macrodomain proteins and find evidence that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1. In line with MERS being more structurally divergent than SARS-1 from SARS CoV2, the ΔTagg for SARS-1 and MERS are 4℃ and 1℃, respectively, compared to 9℃ for Mac1.  These data have been added as Supplementary Fig S3C. Development of broader spectrum pan-inhibitors is on our radar for future work which will more thoroughly assess homologs from divergent coronaviruses.

      We added the following sentence to the main results:

      “Encouragingly, we were also able to adapt our CETSA assay for SARS-1 and MERs macrodomain proteins and find that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1 (Supplementary Figure 3C).”

      We also added this supplementary figure 3:

      Minor

      (1) Line 88, "respectively.heir potency"

      Fixed, thank you!

      (2) Line 149 add a period after proteome

      Fixed, thank you!

      Reviewer #2 (Recommendations for the authors):

      (a) The authors assess inhibition of MacroD2 and Targ1 as of-targets for AVI-4206. However, Mac1 belongs to the MacroD-type class of macrodomains of which MacroD1, MacroD2 and MOD1s of PARP9 and PARP14 are the human members. In contrast Targ1 belongs to the ALC1-like class, which is only very distantly related to Mac1. Furthermore, recent studies have shown that the first macrodomains of PARP9 and PARP4 (MOD1 of PARP9/14) are much closer related to Mac1 and PARP9/14 were implicated in antiviral immunity. As such the authors should include assays showing the activity of their compounds against MacroD1 and MOD1s of PARP9/14.

      We emphasize that we detect no significant shift for any protein other than Mac1 in A549 cells by CETSA-MS (Supplementary Figure 6). For Mac1 CESTA, we see an average of 6 PARP14 spectral counts across conditions and did not detect PARP9.  In addition, for separate work in MPro, we ran similar CETSA experiments where we observed an average of 2 PARP9 and 15 PARP14 spectral counts across conditions. Although PARP9 and PARP14 massively increase expression upon IFN treatment in A549 cells, both proteins have been detected by Western Blot in A549 cells previously at baseline.

      Nonetheless, we have included modeling of more diverse macrodomains as a supplemental figure and added to the text:

      Modeling of other diverse macrodomains, including those within human PARP9 and PARP14 further suggests that AVI-4206 is selective for Mac1 (Supplementary Figure 4)

      (b) In the context of SARS-CoV-2 superinfection are a known major complication of infections. These superinfections are associated with lung damage and therefore it would be good if the authors could assess lung damage, e.g. by histology, to see if their treatment has a positive impact on lung damage and thus may help to suppress complications.

      We performed histology and the results are inconclusive, but suggest that AVI-4206 treatment could lower apoptosis.There is no difference in pathology between the N40D cohort and vehicle with these markers. This could suggest that AVI-4206 provides an additional mechanism that results in protection.  We added to the results:

      Caspase 3 staining shows that AVI-4206 treatment reduces apoptosis in the lungs compared to vehicle controls. Additionally, Masson's Trichrome staining reveals  a significant reduction in collagen deposition, a surrogate for lung pathology, in the lungs of AVI-4206 treated animals.(Supplementary Figure 9).

      Histology:

      Mouse lung tissues were fixed in 4% PFA (Sigma Aldrich, Cat #47608) for 24 hours, washed three times with PBS and stored in 70% ethanol. All the stainings were performed at Histo-Tec Laboratory (Hayward, CA). Samples were processed, embedded in paraffin, and sectioned at 4μm. The slides were dewaxed using xylene and alcohol-based dewaxing solutions. Epitope retrieval was performed by heat-induced epitope retrieval (HIER) of the formalin-fixed, paraffin-embedded tissue using citrate-based pH 6 solution (Leica Microsystems, AR9961) for 20 mins at 95°C. The tissues were stained for H&E, caspase-3 (Biocare #CP229c 1:100), and trichrome, dried, coverslipped (TissueTek-Prisma Coverslipper), and visualized using Axioscan 7 slide scanner (ZEISS) at 40X. Image quantification was performed with Image J software and GraphPad Prism.

      (c) Fig. 1D labelling is wrong

      Thank you - fortunately the data were plotted correctly and it was just the inset table of values that was incorrect. This is now fixed!

      (d) Line 88: "T" missing at start of sentence

      Fixed, thank you!

      (e) Line 118: NudT5/AMP-Glo assay was developed in https://doi.org/10.1021/acs.orglett.8b01742

      We have added this foundational reference, thank you!

      (f) Line 147ff: It would be good if the authors could highlight that the TPP methodology has known limitations (e.g. detection of low abundance proteins and low thermal shift of some binders) and thus is not an absolute proof that AVI-4206 "engage with high specificity for Mac1"

      We added this important context to the concluding sentence of this paragraph:

      “While this assay may not be sensitive to detection of proteins with low abundance proteins or low thermal shift upon ligand binding, collectively, these results indicate that AVI-4206 can cross cellular membranes and engage with high specificity for Mac1.”

      (g) The authors use their well established in vitro Mac1 model as well as the SARS-CoV-2 WA strain. Given the ongoing diversification of SARS-CoV-2 and the current prevalence of the Omicron VOC it would be good if the authors could investigate whether alteration in Mac1 occurred or are detected which could influence the efficacy of their inhibitor. Similarly, it would be interesting to know how effective their drug is on other clinically relevant beta-CoV Mac1, e.g. from MERS or SARS1.

      We thank the reviewer for the suggestion. Mac1 is one of the more conserved areas of the SARS-CoV-2 genome as there has only been one nonsynonymous mutation V34L (Orf1a:V1056L) that recently emerged in the BA.2.86 lineage and is now in all of the JN.1 derivatives. Currently, the mutation is only ~80% penetrant in circulating SARS-CoV-2 sequences suggesting that it might revert to wild-type and is not associated with a fitness benefit. Based on our structural analysis (shown in Supplementary Figure4D above), we do not believe this mutation affects AVI-4206 binding, but we are including this variant in our future in vitro and in vivo studies as well as other beta-CoV.  For SARS and MERS, see response to Reviewer 1 using CETSA to show that these targets are engaged by AVI-4206.

      (h) As methods to detect PARP14-derived ADP-ribosylation are available and it was shown that Mac1 can reverse this modification in cells. It would be good if the authors could investigate the impact of AVI-4206 on ADP-ribosylation in vivo.

      To test this idea we adapted the IF assay used by others in the field and show an effect of AVI-4206. We have added to the text:

      Although the IFN response was not sufficient to control viral replication, it is possible that the changes in ADP-ribosylation, in particular marks catalyzed by PARP14, downstream of IFN treatment could serve as a marker for Mac1 efficacy  (Ribeiro et al. 2025). To investigate whether downstream signals from PARP14 were specifically erased by Mac1, we used an immunofluorescence assay that showed that Mac1 could remove IFN-γ-induced ADP-ribosylation that is mediated by PARP14 (Kar et al. 2024).  We stably expressed wild-type Mac1 and the N40D mutant Mac1 in A549 cells. The data showed that Mac1 expression decreased IFN-γ-induced ADP-ribosylation, whereas the Mac1-N40D mutant did not (Figure 3E, F), indicating that Mac1 mediates the hydrolysis of IFN-γ-induced ADP-ribosylation. The PARP14 inhibitor RBN012759 completely blocked IFN-γ-induced ADP-ribosylation (Figure 3E, F), further confirming that IFN-γ-induced ADP-ribosylation is mediated by PARP14. AVI-4206 reversed the Mac1-induced hydrolysis of ADP-ribosylation and enhanced the ADP-ribosylation signal in Mac1-overexpressing cells (Figure 3E, F), further demonstrating its ability to inhibit the hydrolase activity of Mac1. We further validated this result using different ADP-ribosylation antibodies for immunofluorescence (Supplementary Figure 7). However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8). Collectively, these results provide further evidence that simple cellular models are insufficient to explore the effects of Mac1 inhibition and that monitoring specific PARP14-mediated ADP-ribosylation patterns can provide an accessible biomarker for the efficacy of Mac1 inhibition.

      A549 Mac1 expression cell construction

      Mac1 wild-type (Mac1) and N1062D mutant (Mac1 N1062D) gene fragments were loaded into pLVX-EF1α-IRES-Puro (empty vector, EV) using Gibson cloning kit (NEB E5510). Lentivirus was prepared as previously described (PMID: 30449619; DOI: 10.1016/j.cell.2018.10.024). Briefly, 15 million HEK293T cells were grown overnight on 15 cm poly-L-Lysine coated dishes and then transfected with 6 ug pMD2.G (Addgene plasmid # 12259 ; http://n2t.net/addgene:12259 ; RRID:Addgene_12259), 18 ug dR8.91 (since replaced by second generation compatible pCMV-dR8.2, Addgene plasmid #8455) and 24 ug pLVX-EF1α-IRES-Puro (EV, Mac1, Mac1-N1062D) plasmids using the lipofectamine 3000 transfection reagent per the manufacturer’s protocol (Thermo Fisher Scientific, Cat #L3000001). pMD2.G and dR8.91 were a gift from Didier Trono. The following day, media was refreshed with the addition of viral boost reagent at 500x as per the manufacturer’s protocol (Alstem, Cat #VB100). Viral supernatant was collected 48 hours post transfection and spun down at 300 g for 10 minutes, to remove cell debris. To concentrate the lentiviral particles, Alstem precipitation solution (Alstem, Cat #VC100) was added, mixed, and refrigerated at 4°C overnight. The virus was then concentrated by centrifugation at 1500 g for 30 minutes, at 4°C. Finally, each lentiviral pellet was resuspended at 100x of original volume in cold DMEM+10%FBS+1% penicillin-streptomycin and stored until use at -80°C. To generate Mac1 overexpressing cells, 2 million A549 cells were seeded in 10 cm dishes and transduced with lentivirus in the presence of 8 μg/mL polybrene (Sigma, TR-1003-G). The media was changed after 24h and, after 48 hours, media containing 2μg/ml puromycin was added. Cells were selected for 72 hours and then expanded without selection. The expression of Mac1 was confirmed by Western Blot.

      Immunofluorescence assay:

      To assess the effect of Mac1 on IFN-induced ADP-ribosylation. A549-pLVX-EV, A549-pLVX-Mac1 and A549-pLVX-Mac1-N1062D cells were seeded in 96-well plate (10,000 cells/well). Cells were pre-treated with medium or 100 unit/mL IFN-γ (Sigma, SRP3058) for 24 hours to induce the expression of ADP-ribosylation. These 3 cell lines were then treated the next day with the indicated concentrations of AVI-4206 or RBN012759 (Medchemexpress, HY-136979). After 24 hours of exposure to drugs, treated cells were fixed in pre-cooled methanol at -20°C for 20 min, blocked in 3% bovine serum albumin for 15 min, incubated with Poly/Mono-ADP Ribose (E6F6A) Rabbit mAb (CST, 83732S) or Poly/Mono-ADP Ribose (D9P7Z) Rabbit mAb (CST, 89190S) antibodies for 1 h, and then incubated with Goat anti-Rabbit IgG Secondary Antibody, Alexa Fluor 488 (ThermoFisher, A-11008) secondary antibodies for 30 min and stained with DAPI for 10 minutes. Fluorescent cells were imaged with an IN Cell Analyzer 6500 System (Cytiva) and analyzed using IN Carta software (Cytiva).

      Reviewer #3 (Recommendations for the authors):

      Just a couple of observations/details that might help strengthen the article:

      (1) The caco-1 data for AVI4206 would suggest that there is some sort of efflux going on, yet there is no mention of it in the paper. This might be useful in the optimization paradigm moving forward.

      We thank the reviewer for this observation and suggestion.  Indeed, we believe that efflux is behind the low oral bioavailability of AVI-4206.  We are working specifically to remove this liability in next-generation analogs, using the caco2 assay to guide this ongoing effort. Keep an eye out for a preprint on this soon!  We have added to the discussion:

      “In addition to dissecting such molecular mechanisms of macrodomain function and inhibition, future efforts will focus on improving pharmacokinetic properties, including a cellular efflux liability that results in low oral bioavailability of AVI-4206. ”

      (2) There are some spectroscopic anomalies/mistakes in the NMR data. The carbon NMR for 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one should only have 14 unique carbons, but the authors report 15. The HNMR for AVI1500 should only have 19 H's, but the authors list 20. The HNMR data for AVI3762/3763 should have 16 H's, but the authors only report 13. The CNMR for AVI4206 should only have 19 unique carbons, but the authors report 20.

      Thank you for noting these inconsistencies regarding the reported NMR spectra. We have rectified them by more closely examining the spectra and in some cases acquiring new data. We identified one peak (47.9) in the 13C NMR of 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one that is apparently an artifact of the automated peak picking in the data analysis software.  In the 1H NMR of AVI-1500, the triplet peak at 7.20 integrates to 1H, but was erroneously reported as 2H in the original manuscript.  This error has been corrected.  Spectra were re-acquired for AVI-3762, AVI-3763, and AVI-4206 with longer acquisition times, and/or on a 600 MHz spectrometer to afford the complete line lists now reported in the revised manuscript. Please note AVI-4206 has 18 distinct 13C resonances due to the equivalence of the gem-dimethyl methyl groups.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodelling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability. 

      Strengths: 

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging. 

      We thank the reviewer for this very positive assessment of our work

      Weaknesses: 

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. 

      We thank the reviewer for bringing this point, which we have addressed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells. To examine its prevalence, we have quantified the combined loss frequency of two chromosomes when both are labelled in the same cell. Whereas single chromosomes are lost at a frequency of 10-15% per cell, less than 5% of the cells lose both at the same time.  Thus, the different chromosomes are lost largely but not fully independently from each other. Based on these data, and on the fact that yeast cells have 16 chromosomes, we evaluate that about half of the cells lose at least one chromosome in their final cell cycle.

      We also tried to estimate the prevalence of the pre-mRNA leakage phenotype, based on the increased mCherry to GFP ratio observed between 0h and 24 hours of aging for 146 individual cells. For this analysis, we compared the mCherry/GFP ratio at 0 and 24h for the same individual cell. This analysis indicates that 81% of the cells show a fold change strictly above 1 as they age. Furthermore, the data appears to be unimodal. Thus, we can conservatively conclude that a majority of the cells show premRNA leakage at 24 hours.  Since not all cells are at the end of their life at that time, this is possibly an underestimate.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging. 

      Genomic mis-segregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We have observed these events in our movies as well.  However, the chromosome loss phenotype that we are focusing on affects only some chromosomes (as discussed above) and takes place under proper elongation of the spindle, with one SPB remaining in the mother cell whereas the other one goes to the bud, as shown in the manuscript’s Figure 2.  In our movies, chromosome loss is at least three-fold more frequent (for a single chromosome) than full genome mis-segregation (Sup Fig 1A-B). Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic mis-segregation is not (Sup Fig 1B).  Thus, genomic mis-segregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  This discussion and the relevant data have been added to the manuscript.

      We thank the reviewer for bringing up the possible confusion between these two phenotypes, allowing us to clarify this point.

      Reviewer #2 (Public review): 

      Summary: 

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem. 

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past. 

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications. 

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  Actually, we specifically show that the cells are unlikely to show splicing defect (last figure of the original and the revised version of the manuscript). Our data specifically show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses: 

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.  

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we have now considered.  In the revised manuscript, we now provide measurements of mitotic duration in the “triple intron” mutant compared to wild type cells and the duration of their last cell cycle (See supplementary figure 3A-D). There is no evidence that removing these introns slows down mitosis.  Thus, the kinetic fix hypothesis is unlikely to explain our observation about the effect of intron removal.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7-∆i gene (see supplementary Figure 3EF).  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We now document these findings in supplementary figure 3.  

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1, or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences. 

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 7B are very clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 7B). The difference is highly significant.  Therefore, we are clearly not at the statistical limit.

      What the reviewer refers to is the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cell between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells, as the reviewer rightly concludes.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 7B is not because the old cells have much more dots in general (see supplementary Figure 4C).  We hope that these clarifications help understand the data better.  We have edited the manuscript to avoid confusion.

      Reviewer #3 (Public review): 

      Summary: 

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodelling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i. 

      Strengths: 

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained. 

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:  

      In some cases, controls for experiments were not presented or were depicted in other figures. 

      We are sorry about this confusion.  We have improved our presentation of the controls, bringing them back each time they are relevant.  We have also added those that were missing (such as those mentioned by reviewer 2, see above). Note that the frequencies of centromeric plasmid loss at 0h in Figure 1C is not meaningful and therefore not presented. Since the cells were grown on selective medium before loading on to the ageing chip, we cannot report a plasmid loss frequency here. The ageing experiments themselves were subsequently conducted in full medium, to allow for centromeric plasmid loss without killing the cell. We explain this in the materials and methods section.

      High variability was seen in chromosome loss data, leading to large error bars. 

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This is now corrected as follows.  We divided the available data into 4 cohorts and then plotted the average loss frequency across these cohorts for the indicated age groups.  This filters out much of the noise and improves the statistical resolution.

      The text could have been more polished. 

      Thank you for this comment.  We have gone through the manuscript again in detail.

      Reviewer #1 (Recommendations for the authors):

      (1) A previous study (PMID: 31714209). showed that aging yeast cells undergo genomic missegregation in which material was abnormally segregated to the daughter cells, leading to cell cycle arrest. After that, the missegregation is either corrected by returning aberrantly segregated genetic material to the mother cells so that they can resume cell cycles, or if not corrected, the mother cells will terminally exist the cell cycle and eventually die. That paper also showed that this agedependent genomic missegregation is related to rDNA instability. Is the chromosome loss in this work related to the genomic missegregation reported before? Is it partially reversible like genomic missegregation? Are all the chromosomes lost in one cell division, like in the case of genomic missegregation? Some additional characterization and a discussion would be helpful. 

      As mentioned above, indeed the phenotype of full genome mis-segregation described by Crane et al. (2019) is observable in our data as well. At 24h ~3% of the cells segregate both SPBs to the bud, as they previously described (Supp Figure 1A and B).  This phenomenon is clearly distinct from asymmetric chromosome partition, where cells undergo anaphase, separate the SPBs and segregate one to the mother cell and one to the bud (Figure 2A).  Also, asymmetric chromosome partitioning affects only a subset of the chromosomes (see below), not the entire genome. Finally, unlike asymmetric chromosome partitioning, the frequency of genome mis-segregation in ageing was not alleviated by intron removal (Supp Figure 1B). Thus, these two processes are clearly distinct and driven by different mechanisms. Note that asymmetric chromosome partitioning appears 3 to 5 times more frequently than genomic mis-segregation.

      Supporting further the notion that these two processes are distinct, chromosome loss seals the end of the life of the cell, as we reported, indicating that this is not a reversible event.  Also, it does not involve all chromosomes at once. Cells that contain the labelled versions of both chromosome II and IV at the same time, the loss frequency of both chromosomes is less than 5%, whereas each chromosome is lost in 10-15% of the cells (Figure 1C). Thus, most cells lose one and keep the other. Furthermore, this indicates that there are many more cells losing at least one chromosome than the 15% that lose chromosome IV for example, probably 50% or more.  Thus, chromosome loss by asymmetric segregation is much more frequent than the partly transient transfer of the entire nucleus to the bud.

      (2) What percentage of aging WT cells undergo pre-mRNA leakage (using the GFP/mCherry reporter) during their entire lifespan? Is it a sporadic, reversible process or an accumulative, one-way deterioration? Previous studies (PMID: 32675375; PMID: 24332850; PMID: 36194205; PMID: 31291577) showed that only a fraction of yeast cells age with rDNA instability and ERC accumulation, as indicated by excessive rRNA transcription and nucleolar enlargement. Are they the same fraction of aging cells that undergo pre-mRNA leakage and chromosome loss? This information will indicate the prevalence of the key aging phenotypes reported in this work and should be readily obtainable from microfluidic experiments. In addition, a careful discussion would be helpful. 

      Pre-mRNA leakage is relatively widespread in the population, but it is difficult to put a precise number on it. Analysis of how the mCherry/GFP ratio changes in 146 individual cells between 0 and 24 hours and imaging in our microfluidics platform indicates that ~80% show an increase and 50% of the cells show an increase above 1.5-fold. Therefore, the frequencies of pre-mRNA leakage and chromosome loss are probably similar.  We have modified the discussion to account for these considerations.  This would be in the same range as the frequency of aging by ERC accumulation (mode 1) estimated by PMID: 32675375. 

      Reviewer #2 (Recommendations for the authors)

      The manuscript could use a bit of editing in places - please go through it once more. 

      Editing suggestions: 

      Line 80 – irrespective

      Corrected.

      Line 97 - these are not "rates" but frequencies. Please correct this error throughout. 

      Replaced “rate” with “frequency throughout the manuscript and the figures, when pertaining to chromosome loss

      Line 328 - increase in chromosome... 

      Corrected.

      Line 379 - tampering 

      Reviewer #3 (Recommendations for the authors):

      Specific Feedback to Authors 

      (a) Major Points 

      (i) While the proposed connection between ERC-mediated nuclear basket removal and erroneous error correction was clearly stated, this connection is correlative and was not directly tested. Specifically, although mutants impacting ERC levels were tested for missegregation, it was not directly tested if increased missegregation levels occurred due to ERC tethering to the NPC and subsequent nuclear basket removal. It is possible that the increased ERCs may be driving missegregation via a different pathway. Authors should consider experiments to strengthen this idea, such as looking at chromosome loss frequency in a sir2∆ 3x∆i double mutant, or a sir2∆ sgf73∆ double mutant. 

      This connection is addressed in the original version of the manuscript, where we show that preventing attachment of ERCs to the NPC, by removing the linker protein Sgf73, alleviates chromosome loss.  The link is further substantiated by the fact that removing the basket on its own promote chromosome loss and that in both cases, namely during normal aging, i.e., upon ERC accumulation, and upon basket removal the mechanism of chromosome loss is the same.  In both cases, it depends on the introns of the GLC7, MCM21 and NBL1 genes.  

      However, we acknowledge that the mutants tested have pleiotropic effects, making interpretation somewhat difficult, even when examining chromosome loss in multiple mutants that affect ERC formation and NPC remodelling, as we have done.  As recommended by the reviewer, we have characterized the phenotype of the sir2∆ 3x∆i mutant strain. Intron removal in the sir2∆ mutant cells largely rescued the elevated chromosome loss frequency of these cells and slightly extended their replicative lifespan (Figure 6D-E). We conclude that intron removal can remedy the chromosome loss phenotype of the sir2∆. Although clearly significant, the effect on the replicative lifespan was not very strong, likely due to the sir2∆ affecting other ageing processes.

      Touching on this question, we added a new set of experiments asking whether any accumulating DNA circle causes chromosome loss in an intron-dependent manner.  Thus, we have introduced a noncentromeric replicative plasmid in wild type and 3x∆i mutant strains carrying the labelled version of chromosome II (Figure 6A-C).  These studies show that these cells age much faster than wild type cells, as expected, and lose chromosomes at a higher frequency than non-transformed cells.  Finally, the effect is at least in part alleviated by removing the introns of NBL1, MCM21 and GLC7.

      Therefore, after adding this new and more direct test of the role of DNA circles in chromosome loss, we are confidently concluding that ERC-mediated basket removal is the trigger of chromosome loss in old cells.

      (b) Minor Points 

      (i) In Figure 1C, the text (lines 91-92) argues that chromosome loss happens abruptly as cells age; however the data only show loss at young and old time points, not an intermediate, which leaves open the possibility that chromosome loss is occurring gradually. While cells that lost chromosomes should fail to divide further, we don't know if these events happened and were simply excluded.

      We agree with the reviewer that formally the conclusion drawn in the lines 91-92 (of the original manuscript), namely that chromosome loss takes place abruptly as cells age, cannot be drawn from the Figure 1C alone but only from subsequent observations. However, since chromosome loss is lethal in haploid, as we mention in the text and the reviewer notes as well, it is difficult to envision how cells could lose chromosomes before the end of their lifespan and must therefore increase abruptly as the cells reach that point.  This is now underlined in the revised version of the manuscript. Accordingly, the frequency of chromosome loss per age group, which is depicted in Figure 3A, shows that the wild type cells that have budded less than 10 times show no chromosome loss. The chromosome loss frequency starts to ramp up only pass that point. Therefore, chromosome loss does not increase linearly with age.

      Additionally, cells that lost minichromosome should not arrest. We suggest that the interpretation of these data should be softened in the text, or that chromosome loss fraction could be more effectively portrayed as a Kaplan-Meier survival curve depicting cells that have not lost chromosomes, if these data are easily available. Or, chromosome loss at an intermediate time point could be depicted. 

      Since we cannot visualize more than 2 chromosomes at a time, it is not possible to plot the KaplanMeier curve of cells that have not lost chromosomes. However, as mentioned above, the chromosome loss frequencies at intermediate time points are depicted in Figure 3A and Figure 4B and shows that it increases with age.

      (ii) Also regarding Figure 1, it would be helpful to expound on the purpose of the minichromosomes, as well as how the Ubi-GFP minichromosome is constructed. 

      We now explained why we tested the loss of minichromosome, namely, as a mean to test whether the centromere is necessary and sufficient to drive the loss of the genetic material linked to it, i.e., chromosomes, in old cells.  Concerning the Ubi-GFP minichromosome, the Materials and methods section is now updated and reports plasmid construction, backbone used, primers as well as the plasmid sequence being available in the supplementary data.

      The purpose of the minichromosome initially appears to be the engineering of an eccDNA (ERC) with a CEN to demonstrate distinct behaviour, but it is unclear whether this was actually conducted or if the minichromosome are simply CEN plasmids and/or if this was the intended goal. Furthermore, lines 102-103 state that the presence of a centromere was necessary and sufficient for minichromosome loss. However, since no constructs lacking a centromere were tested, necessity cannot be concluded. Please clarify this in the text and include experimental details to help readers understand what was tested. 

      We apologize for having been too short here. The behaviour of the CEN-less version of this plasmid has been characterized in detail in previous studies (Shcheprova et al., 2008; Denoth-Lippuner 2014, Meinema et al 2022). Here we focused on the behaviour of the CEN+ version of an otherwise Identical plasmid.  We now clarify in the text that this plasmid is retained in the mother cell when CEN-less and cite the relevant literature. 

      (iii) It is unclear how cells at 0-3 budding events were identified in assays using the microfluidics platform. Can the authors clarify the known "age" of the cells once captured, i.e. how do the authors know how many divisions a cell has undergone prior to capture? 

      The reviewer is right; we do not know the exact age of these cells.  However, in any asynchronous population of yeast cells, which is what we start from, 50% of the cells are newborn daughters, 25% have budded once, 12.5 have budded twice, 6.25 % have budded three times…  Therefore, at the time of loading, 93% of the cells have budded between 0 and 3 times.  For this reason, we report to this population as cells age 0-3 CBE. We acknowledge that this is an approximation, but it remains a relatively safe one.  

      (iv) While the schematic in Figure 2D is generally helpful, a different depiction of the old and new SPBs would be beneficial in cases where the new SPB and TetR-GFP are depicted as colocalized, it is difficult to see that the red is fainter for the new SPB. 

      We have corrected this issue by completely separating the SPB and the Chromosome signals in the Figure 2D.

      (v) In Figure 2F, the grey colour of the 12h Ipl1-321 data bar did not have high enough contrast when the manuscript was printed-would recommend changing this to a darker shade. 

      We have corrected this issue by using a darker shade of grey.

      (vi) In Figure 3A, 'Budding' is misspelled on X-axis label  

      We have corrected this error.

      (vii) In Figure 4, the authors should clarify the differences between the analyses in panels B and C. The distinction is not immediately clear and may be difficult to grasp upon initial reading. 

      We have corrected this issue in the main text as well as figure legend.

      (viii) In Figure 5, It would aid comparisons to depict the 3x∆i only as well on panels B, D, and E. 

      We have added 3x∆i data to Figure 5,6 and 8.

      (ix) In Figure 6D, it is unclear why there was an appreciable level of unspliced RNA in the wild-type and sir2∆ young cells. Additionally, it is unclear why there is so much signal observed in the Merge image for the old wild-type cell, especially regarding the apparent bright spot. Is that nuclear signal? Please clarify. 

      The pre-mRNA processing reporter is not very efficiently spliced. It was selected as such during design (Sorenson et al 2014; DOI: 10.1261/rna.042663.113) to provide sensitivity. As for the bright spot occurring, translation of the unspliced reporter produces the N-terminal part of a ribosomal protein, a fraction of which forms some sort of nuclear aggregate in a fraction of the population. 

      (x) In Figure 6E, why does the sir2∆ exhibit higher mCherry/GFP than the wild-type and fob1∆ at "young age"? Is this due to disrupted proteostasis in the sir2∆, or a different pleiotropic effect of sir2∆? Please comment on this observation in the text.

      Indeed, as we have stated in the text the sir2∆ mutation already perturbs pre-mRNA processing in young cells. We do not know the reason of this but indeed it is most probably reflective of its pleiotropic function. Following the reviewer’s request, we now state this in the text. For example, Sir2 may regulate the acetylation state of the basket itself.  The genetic interactions observed between sir2∆ and quite a few nucleoporin mutations seem to support this possibility. 

      (xi) Throughout, the authors switch between depicting aging in Completed Budding Events versus hours, which made it difficult to compare data across figures

      Ideally, all the data in this manuscript should be plotted according to the CBE age of the cell. To ensure that the major findings are plotted in such a way, we have done so for over ~3000 combined cells and thousands of replicative divisions in Figures 3,5-7. All the measurements of chromosome loss at a specific CBE had to be done manually, due to the absence of algorithms that would be able to accurately detect chromosome loss and replicative age. Therefore, doing this for the entirety of our dataset, encompassing well over 50 ageing chips and tens of thousands of cells is not easily doable at this stage. 

      (xii) Typo on line 12 (Sindle Pole Body) 

      We have corrected this error.

      (xiii) The phrase should be 'chromosome partitioning' rather than 'chromosome partition', throughoutfor example, line 17 

      Replaced “chromosome partition” with “chromosome partitioning” throughout the text.

      (xiv) There are inconsistencies between plural and singular references throughout sentences-example, lines 35-37, and lines 44-45. 

      We carefully combed through the manuscript again and hope that we caught all inconsistencies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. Author response:

      Reviewer #1 (Public review)

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      We will consider the inclusion of live imaging experiments using the expressión of C-terminus-tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we will explore how to develop these experiments to generate data for inclusion in a revised submission.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      We don´t discard the presence of “nano beads” in these axons. It was recently suggested that the normal morphology of axons is indeed resembling “pearls-on-a-string” (Griswold et al., 2025), with “nano beads” separated by thin tubular "connectors" (also referred to as NSV, for non-synaptic varicosities). However, it is unlikely that the gap-patch pattern of beta2-spectrin can be attributed to such a morphology, given we used formaldehyde as fixative, and Griswold and colleagues show that the use of aldehyde-based fixatives do not preserve NSVs. We are able to see scattered axonal enlargements (“micro beads”), as we described in distal portions in Fig. 1C(C2) and E. However, the number, appearance and staining of these are not compatible with the gap-patch pattern in beta2-spectrin. Moreover, we would have expected to see these NSVs in our extensive STED imaging, yet we did not. We will discuss this further in the resubmission.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, but we don´t interpret this as “patchy microtubules”. If the Reviewer refers to Fig. 2C-D, it is actually difficult to anticipate the slight decrease in intensity by the naked eye. To further support this, we will consider including stainings and quantitative analyses for microtubules in the resubmission. We are familiar with the use of permeabilizing conditions during fixation (in protocols known as “cytoskeletal fixation” to label microtubules (and not free tubulin).

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low beta2-spectrin and no MPS. Also, the dynamical evolution of the MPS has to take into account beta2-spectrin supply. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of beta2-spectrin . To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into overall MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring new actin filaments. However, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails). Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. It ultimately affects actin filaments as they end up losing monomers, and devoid of new monomers, filaments get shorter and eventually disappear. The drastic decrease in F-actin in our axons reflects that. The fact that F-actin in the MPS is preserved only speaks to the fact that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. We will support this with more observations and imaging and with a more extensive discussion summarizing the literature on the matter in the resubmission.

      On the other hand, the use of F-actin stabilizing drugs (like Jasplakinolide) would have a different effect. We will study how an experiment with these drugs could be informative of the process under investigation for the resubmission

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. As stated above, we will consider the inclusion of live imaging experiments using the expressión of C-terminus tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we think we can provide the evidence suggested.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      We will consider the inclusion of live imaging experiments using the expressión of tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we believe we can provide the evidence suggested. We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we will continue to work with these cells, but the goal of that project lies well beyond the primary message of the present manuscript, and we anticipate that the revised version will not include new data on this matter. 

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      We will further explore the inclusion of more measurements of other parameters and variables towards establishing whether these gaps-and-patches patterns are equivalent structures in control and staurosporine-treated cells. 

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      As stated before regarding similar comments by other reviewers, we will consider the inclusion of live imaging experiments in the revised version of the manuscript.

      Nicolas Unsain, PhD, and Thomas Durcan, PhD.

      References

      Griswold, J.M., Bonilla-Quintana, M., Pepper, R. et al. Membrane mechanics dictate axonal pearls-on-a-string morphology and function. Nat Neurosci 28, 49–61 (2025). https://doi.org/10.1038/s41593-024-01813-1

      Guix F.X., Marrero Capitán A., Casadomé-Perales A., Palomares-Pérez .I, López Del Castillo I., Miguel V., Goedeke L., Martín M.G., Lamas S., Peinado H., Fernández-Hernando C., Dotti C.G. Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Sci Alliance. 2021 Jun 28;4(8):e202101055. doi: 10.26508/lsa.202101055. Print 2021 Aug.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS, and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how the level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.”

      Strengths: 

      The manuscript is well-written, and experiments are succinctly planned and outlined. The experiments were used to provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.”

      We thank the reviewer for their support of our study and recognition of its importance to understanding microtubule glycylation and its regulation.  

      “The initial part of the manuscript where the authors discuss about the requirement of monoglycylation by TTLL8 is not new. This was established back in 2009 when Rogowski et al (2009) showed that polyglycylation of tubulin by TTLL10 occurs only when co-expressed in cells with TTLL3 or TTLL8. So, this part of the study adds very little new information to what was known. “

      Our study provides the first in vitro evidence with purified recombinant components that human TTLL8 is exclusively a monoglycylase (Figure 1) and that polyglycylation by TTLL10 requires previous priming with monoglycylation (Figure 2). Studies with purified recombinant components are the gold standard for establishing the activity of an enzyme as cellular work can be obfuscated by the activity of other regulators. We did cite in our original submission the work by Rogowski, Gaertig and Janke from 2009 (reference 15 in the original submission) as well as that Ikegami and Setou 2009 work (reference 26 in the original submission) that established that TTLL10 polygyclylase activity requires co-expression with TTLL8 in cells. Specifically, we stated in our original submission and in the revised manuscript:

      “Cellular overexpression studies coupled with the use of antibodies that recognize mono- and polyglycylation indicate that TTLL8 is also a glycyl-initiase, while TTLL10 a glycyl-elongase (15, 26).  However, direct biochemical evidence with purified enzymes for segregated initiation and elongation activity for glyclases is still lacking as does knowledge of their substrate specificity and regulation.” 

      In addition to citing the Setou study, we now cite again the Rogowski, Gaertig and Janke 2009 study later in the manuscript when the cellular data are mentioned again.  Specifically, we state in the revised manuscript: 

      “This is consistent with cellular overexpression data which showed that polyglycylation signal was detected via antibody only in tubulin from cells that co-expressed TTLL8 and TTLL10, but not TTLL10 alone (15, 26).”

      “The study also fails to discuss the involvement of the other monoglycylase, TTLL3 in the entire study, which is a weakness as in vivo, in cells, both the monoglycylases act in concert and so, may play a role in regulating the activity of TTLL10. “

      We previously showed that purified recombinant TTLL3, like TTLL8, adds only monoglycines, with a preference for the b-tubulin tail (Garnham et al., PNAS 2017). Given that TTLL10 requires priming by monoglycylation, we expect that, similarly to TTLL8, TTLL3 will allow elongation of the initial monoglycyline chains by TTLL10. 

      (1) From the mass spec data, it appears that the Xaenopus Laevis TTLL10 can add up to 18 residues. However, the numbers indicated in Figure 2E seem to suggest that it is a maximum of 23 residues only at a particular position. Does this mean that the 13-18 residues observed are a collection of multiple short-chain polyglycylations or are there positions that the authors observed where there were chains of longer than 3 glycine residues? This would be an interesting point to note as when it was discovered in Paramecium, the polyglycyl chains were reported to be up to 34 residues (Redeker et al., Science 1994). If the authors could test the TTLL10 from Paramecium to observe if this is a consistent phenomenon across evolution or is there a biologically significant difference that is being developed, would be interesting to know.”

      Figure 2E shows a subset of the modified tails that we identified and where the position of the posttranslationally added glycine can be mapped to a specific position, or range of positions. Additional species exist. We note that the mass spectra in Figure 2B are intact LC/MS, while those in Figure 2E are MS/MS. The ionization of tubulin tail peptides with larger number of glycines is not as efficient as for shorter glycine chains, reducing the sensitivity of detection of species that have higher number of glycines. This is not as pronounced when the mass spectra are obtained from the intact protein (Figure 2B). In summary, our data supports the fact that TTLL10 elongates polyglycine chains at multiple positions in the tubulin tail (shown in Figure 2E), however, we cannot ascertain the maximum polyglycine chain length, only the total number of glycyines added.

      Testing the enzyme from Paramecium is an interesting proposal but outside the scope of this manuscript. 

      (2) While it is interesting to know that the TTLL10 binds to TTLL8-modified tubulin with a much higher affinity than unmodified tubulin, in vivo, the microtubules will be a mixture of both TTLL3- and TTLL8-modified tubulin. It would be good to see the binding of the enzyme to a tubulin that is modified by both TTLL3 and TTLL8 if the two have a greater influence on TTLL10 binding.”

      Our previous work showed that purified recombinant TTLL3 has purely monoglycylase activity, with a preference for b-tubulin (Garnham et al., PNAS 2017). The sites of monoglycylation by TTLL3 overlap with those introduced by TTLL8 on b-tubulin (the difference being mainly that TTLL3 is more selective towards b-tubulin and thus has lower activity on a-tubulin). TTLL8 introduces additional monoGlys on the a-tubulin tail. Therefore, it is unlikely that TTLL10 will have a different response to microtubules that carry similar numbers of Gly residues, regardless of whether introduced by TTLL8 or TTLL3 and 8. Our data show that TTLL10 binding increases with Gly number, but that the gains in affinity plateau as the density of glycine residues on the tails increases above a certain threshold, likely because one TTLL10 molecule recognizes one monoGly branch, and steric hindrance on the tubulin tail prevents further recruitment of additional TTLL10 molecules.  

      (3) The authors have always increased the number of monoglycines in beta-tubulin more than in alpha-tubulin. Is there a rationale for this? Since TTLL8 is known to predominantly modify alphatubulin (Rogowski et al., 2009; Gadadhar et al., 2017) why did the authors not check for the increased binding of the TTLL10 on dimers where the number of monoglycines on alpha-tubulin is higher than 1.1? Especially when they themselves observe in their mass spec that even on alphatubulin there are 1, 2, and 3 glycines added. I would like to see what happens if the ratio is high alpha-G + low beta-G”

      As our spectra in Figure 1 show, we find that TTLL8 is able to modify robustly in vitro both a- and b-tubulin but that it shows a slight preference for b-tubulin (Figure 1B). The work from the Janke group that the reviewer is referring to (Rogowski et al., 2009 and Gadahar et al., 2017) did not use recombinant, purified enzymes and unmodified microtubules as substrates and used axonemal tubulin (which carries many modifications), and so it is possible that the a-tubulin preference observed in that system when TTLL8 is overexpressed, is likely to other factors that do not reflect the biochemical property of the enzyme alone (for example, it could be because btubulin site are not available because they are already glutamylated). As can be seen from Figure 3D, the gain in affinity when increasing the number of glycines from one glycine is small, compared to the initial monoglycine added to the a- and the b-tubulin tail, likely reflecting that one tail cannot bind more than one TTLL10 at one time because of steric hindrance. Moreover, it is important here to note that glutamylation and glycylases compete for the same sites on the tubulin tails, as we have for example shown for TTLL3 and TTLL7 (Garnham et al., 2017), therefore the activity of these enzymes in vivo or with non-naïve substrates are context dependent and influences also what sites are available for TTLL10 to modify. In conclusion, by using recombinant enzymes and naïve tubulin we gain insight into the intrinsic property of these enzymes and therefore provide a framework for the interpretation of in vitro and in vivo observations. 

      (4) I wonder why the authors did not use the human TTLL10 to test if this also shows similar binding to the glycylated tubulin despite the fact that it is enzymatically inactive. If it does, then it would be interesting to see the kinetics of binding of this enzyme to see if the fall off of the enzyme from the tubulin is solely driven by the level of polyglycylation only, or if it has any other mechanism involved as well.”

      Work with human recombinant TTLL10, a TTLL10 homolog which was proposed to be inactive, will be an interesting future direction but outside the scope of this manuscript. We did note in our previous manuscript (Garnham et al., 2017, Figure S5) that the residues which are mutated in the human enzyme compared to other mammals are on the dorsal face of the enzyme, far away from the active site, raising an interesting question of how they inactivate the enzyme.   We need however to emphasize that our work clearly shows that it is polyglycylation on the microtubules that reduces binding of TTL10 to microtubules because experiments done in the absence of glycylating activity i.e. with enzyme that was incubated with microtubules that were pre-modified with polyglycline chains, but in the absence of glycyine substrate (precluding any glycylation activity during the binding assay) show that the binding decreases monotonically with the number of polyglycines  on the microtubule (Figures 4A, B).  

      (5) In Figure 5, the authors use monoglycylated tubulin that is either glutamylated or not to show that the activity of TTLL10 is enhanced by the extent of polyglutamylation present on the tubulin. However, there is no evidence of the enzyme binding to microtubules that are only glutamylated. It would be good to test this to determine if the binding is also dependent on both monoglycylation and glutamylation or is it only the enzyme activity.

      Figure 5E shows that TTLL10 binding increases with monoglycylation alone, and that glutamylation is additive and Figures 4A, B show that it is not the enzyme activity that affects the binding, but the glycylation state of the microtubule. We did not determine binding to microtubules that were only glutamylated, because TTLL10 would not be able to elongate polyglycine chains on those microtubules, even if it bound. 

      (6) The level of polyglycylation used in Figure 5 is quite low. It would be good to see how the length of the polyglycine chain impacts TTLL10 activity in the presence of polyglutamylation, and whether this has any cooperative effect leading to longer chain polyglycylation than what is seen with only monoglycylated tubulin.

      We expect longer chain polyglycylation to have an inhibitory effect as we show in Figure 4. 

      “(7) In the overall study, the authors fail to discuss whether the activity of both the glycylases at different sites on tubulin is sequential, or modifications at different residues happen all at once. If the authors were to do a sequential time course of the modification followed by MS/MS analysis, they could get some indications about this.”

      As the data in Figure 3D shows, the effect of adding more monoGly site on a tubulin tail has a muted effect on binding, indicating that the additional mono-Gly branches do not lead to more TTLL10 recruitment because of steric hindrance i.e. multiple TTLL10 enzymes cannot be accommodated on the same tail at the same time efficiently. This is consistent with the overall dimensions of the enzyme and the positions of its active site, which were modeled initially in our previous publication (Garnham et al., PNAS 2017).  The site of TTL10 action is pre-determined by the position of the mono-Gly branch introduced by TTLL3 or TTLL8. The length of the tubulin tail and the proximity of mono-Gly sites to each other precludes TTLL10 acting at multiple positions at once on the same tail.

      “(8) Do the modifications have any cooperative effect with respect to the sites of modification? Does modifying a particular site enhance the kinetics of modification of the other sites? Can the authors test this?”

      This would be an interesting line of future investigations.  

      “Minor points:

      (1’) The authors opine that the level of polyglycylation is regulated by the decreased binding of the TTLL10 to the polyglycylated tubulin. While this is an interesting argument, which could be a possibility based on the data they present, it would still not answer if this is a mechanism followed by TTLL10 of all species or not. If they could test the efficacy of TTLL10 from another species, to see the binding efficiency of that enzyme, it could potentially strengthen their argument of this possible mechanism.”

      The differences between the properties of TTLL10 from different organisms will be an interesting focus of future investigations, but outside the scope of this present study. However, we would like to point out that the level of sequence conservation between TTLL10 makes it unlikely that other TTLL10 do not follow a similar mechanism, albeit with possible differences in the extent of the response.  We also note that we have shown that polyglycylation also inhibits binding to the microtubule of the severing enzyme katanin (Szczesna et al., Dev. Cell 2022). Therefore, these studies suggests that polyglycylation might be a more general mechanism for reducing microtubule binding affinity since glycylation reduces the negative charge on the tubulin tails, which frequently interact with positively charged domains or interfaces in microtubule associated proteins.  

      “(2) The authors indicate that glycylases act on pre-glutamylated microtubules. However, in their assays, they use unmodified tubulin, which I would presume is also not glutamylated. If this is the case, how can they justify that the enzymes prefer pre-glutamylated microtubules? This is a bit unclear. Do they mean that their tubulin is already pre-glutamylated? Have they tested this?”

      The statement regarding the action of these enzymes on glutamylated microtubules refer to the in vivo situation where polyglycylated microtubules appear in cilia biogenesis after the microtubules in the axoneme are already glutamylated. In vitro, by using microtubules that are only monoglycylated and microtubules that are both glutamylated and monoglycylated, we show that glutamylation further increases recruitment of TTLL10 to microtubules that are monoglycyated. Therefore, glutamylated microtubules will be polyglycylated preferentially over those that are not glutamylated. 

      We state: “Axonemal microtubules are abundantly glutamylated. Glutamylation appears during cilia development first, followed by glycylation (12, 13), indicating that in this scenario glycylases act on pre-glutamylated microtubule substrates.”

      “(3) In continuation with the previous point, an immunoblot of their purified tubulin showing no reactivity to anti-glycylation or anti-glutamylation antibodies, which upon treatment with TTLL8 reacts to the anti-glycylation antibody would be confirmatory evidence to show that the isolated tubulin was indeed unmodified.”

      We have now included a Western blot of our TOG-purified tubulin as Figure S3 in our revised manuscript.  This shows a faint signal with the pep-G1 antibody and a very strong signal after TTLL8 treatment. We are not sure whether the low signal with the pep-G1 antibody for the unmodified tubulin is due to low bona fide monoglycylation-specific signal or a low affinity nonspecific interaction of this antibody (raised against mono-glycylated tubulin tail peptides) with the unmodified tubulin. We note that this signal is clearly visible only when loading at least 0.2 micrograms of the purified tubulin. At this loading level the signal for the glycylated species is saturated. It is also important to note that we have not detected glycylated species in this tubulin either by LC-MS or MS/MS. Therefore, our data strongly indicate that the tubulin purified from tsA201 cells is not glycylated or has at most extremely low levels of glycylation. Importantly, this potential trace level of monoglycylated tubulin does not affect any of the conclusions in this study. The Western blot also shows no detectable signal with the polyglycyation antibody in the unmodified tubulin and a very strong, saturated signal after the tubulin was treated with both TTLL8 and TTLL10.  We also added an additional Figure S8 that shows that the tSA201 tubulin does not give a detectable signal for glutamylation. Please see also Figure 3 from Vemu et al., Methods Enzymology 2017 where we also published a Western blot from our TOG-purified tubulin using anti-glutamylation antibodies. 

      “(4) In their study, the authors have used polyglycylation of up to 10-13 residues. This brings me to my first point that in the case of Paramecium, the number was identified to be up to 34, which would mean that this enzyme has higher binding or catalytic activity. I would like to know the authors' perspective on this, as to what could potentially determine the difference in the activities of TTLL10 across species.”

      The Xenopus TTLL10 enzyme can add more glycines than the 10-13 range that we show here if the enzyme is incubated for longer periods. The fact that glycine numbers as high as 34 were detected in Paramecium does not necessarily mean that the Paramecium enzyme is more active since there is no equivalent data to compare it with from Xenopus. The only way to address potential species differences in enzyme specific activity is to purify enzymes from different species and compare their activity side-by-side.  

      (5) How was the completion of the reaction of monoglycylation and polyglycylation determined? If the enzymes were left for more than 20 minutes, did TTLL8/ TTLL10 add more glycines? What is the reason for using less tubulin (1:20 enzyme:tubulin molar ratio) for monoglycylation by TTLL8, and more tubulin (1:50 enzyme:tubulin molar ratio) for polyglycylation by TTLL10?

      Yes, if the enzymes were incubated longer, they added more glycines. The extent of glycylation was determined from the LC-MS and the incubation time was varied to obtain samples with fewer or more glycines.   The lower ratio used for TTLL10 is because of the higher specific activity of that enzyme compared to TTLL8.  

      (6) Figure S2 A, b2 ion is not indicated in the peptide sequence, while it is shown in the m/z graph.

      We thank the reviewer for the careful reading. We have corrected this in our MS/MS spectrum. 

      Reviewer #2 (Public review):

      “In their manuscript, Cummings et al. focus on the enzymatic activities of TTLL3, TTLL8, and TTLL10, which catalyze the glycylation of tubulin, a crucial posttranslational modification for cilia maintenance and motility. The experiments are beautifully performed, with meticulous attention to detail and the inclusion of appropriate controls, ensuring the reliability of the findings. The authors utilized in vitro reconstitution to demonstrate that TTLL8 functions exclusively as a glycyl initiase, adding monoglycines at multiple positions on both α- and β-tubulin tails. In contrast, TTLL10 acts solely as a tubulin glycyl elongase, extending existing glycine chains. A notable finding is the differential substrate recognition between TTLL glycylases and TTLL glutamylases, highlighting a broader substrate promiscuity in glycylases compared to the more selective glutamylases. This observation aligns with the greater diversification observed among glutamylases. The study reveals a hierarchical mechanism of enzyme recruitment to microtubules, where TTLL10 binding necessitates prior monoglycylation by TTLL8. This binding is progressively inhibited by increasing polyglycine chain length, suggesting a self-regulatory mechanism for polyglycine chain length control. Furthermore, TTLL10 recruitment is enhanced by TTLL6mediated polyglutamylation, illustrating a complex interplay between different tubulin modifications. In addition, they uncover that polyglutamylation stimulates TTLL10 recruitment without necessarily increasing glycylation on the same tubulin dimer, due to the potential for TTLLs to interact with neighboring tubulin dimers. This mechanism could lead to an enrichment of glycylation on the same microtubule, contributing to the complexity of the tubulin code. The article also addresses a significant challenge in the field: the difficulty of generating microtubules with controlled posttranslational modifications for in vitro studies. By identifying the specific modification sites and the interplay between TTLL activities, the authors provide a valuable tool for creating differentially glycylated microtubules. This advancement will facilitate further studies on the effects of glycylation on microtubule-associated proteins and the broader implications of the tubulin code. In summary, this study substantially contributes to our knowledge of posttranslational enzymes and their regulation, offering new insights into the biochemical mechanisms underlying microtubule modifications. The rigorous experimental approach and the novel findings presented make this a pivotal addition to the field of cellular and molecular biology.”

      We thank the reviewer for their support of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      (1) In Figure 2, only the right or left selective neurons are presented for the comparison, it would be helpful to also compare these with the neurons that are not selective for any of the sides and maybe include them in the supplemental materials

      We have included all non-selective neurons in Figure 2D and supplemental Figure 2B. Their differences in firing rate between left and right sides are quantified by their selective indices (SIs). 

      (2) The authors should provide controls of speed during NMDA infusion and vehicle.

      We have quantified and compared the duration of running laps, which is equivalent to speed.

      (3) In Figure 1d, the trend shows that even during NMDA infusion, the animals learn as shown by a higher proportion of correct trials in the 3rd compared to the 1st trial

      We thank the reviewer for pointing that out. We noticed that NMDAlesioned ACC animal showed a trend of improved performance in the track, and we believe this is due to re-learning of the task, which we point out in the main text. However, we emphasize that, compared to the Vehicle control, the overall performance of NMDA-lesioned animals was significantly impaired.

      (4) Clarify the implications of the NMDA experiments, as it is not straightforward to interpret that an interplay between ACC-CA1 is involved in this task as per this experiment.

      Rather than stating the involvement of ACC-CA1 interplay, we use the results of NMDA lesion experiment to demonstrate that ACC is also required, besides CA1, for the task.

      (5) In Figure 4b, there seems to be a lag between CA1 and ACC correlations; the authors could provide a quantification of this temporal delay between CA1 and ACC.

      Figure 4B shows the cross-correlation between one example ACC cell and its associated CA1 ensembles on the left and opposite sides. There was a broad peak around time lag 0. Our further investigation did not identify a significant, systemic delay for all ACC cells, which led us to quantify the correlation at time lag 0 in Figure 4C and D.

      (6) The example correlation provided in 5c for the opposite, doesn't seem representative of the population trend as shown in 5d, since both the Same and the Opposite for the demo show a positive trend. It would be best to choose an example that represents the population better.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 5C.

      (7) Almost the same can be applied to Figure 6.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 6E.

      (8) The results in Figure 7 are convincing, in my opinion, as they show that the trend is lost for the opposite side (contrary to the coactivation shown in Figures 5 and 6 that showed the same trends for the same and opposite during Demo). Do the authors have any interpretation of this? Is it due to co-activity reflecting other task-relevant features different than the spatial trajectory being observed?

      The correlation on the opposite side between CA1 and ACC shown in Figure 5C-D and Figure 6E-F is likely due to a general interaction between CA1 activities around SWRs with prefrontal cortical areas including ACC, as shown in previous studies (Jadhav et al., 2016; Remondes and Wilson, 2015).  We would like to point out that this correlation only quantifies the coactivation between CA1 ensemble firing rates and individual ACC cells’ firing rate. This raw correlation does not consider the content of spikes generated by CA1 ensembles, neglecting the sequential firing patterns of CA1 cells. The replay analysis in Fig. 7 examines the order of spikes generated by individual CA1 cells. The result in Fig. 7 shows that the sequential activation of CA1 place cells more accurately reflects the distinction between the same- and opposite-side trajectories. We consider Fig. 7 is more refined analysis than Figs. 5 and 6.

      (9) For all the figures regarding SWR activities, the authors should provide average PSTH for CA1 as well as ACC, perhaps also examples of neurons that are selectively active during one side or the opposite side runs.

      Following the reviewer’s suggestion, we have added data to show PSTH for CA1 and ACC cells surrounding SWR peaks (Figure S5E, F). 

      Reviewer #2 (Recommendations For The Authors):

      Below are additional notes for improvements.

      (1) Figure 1C. Unclear what Time 0 indicates.

      We specify it (OB's poke time) in the figure legend. 

      (2) Figure 2C. Unclear what the numbers above datapoints mean.

      Those numbers are selection indices (SIs), as specified in the legend. 

      (3) Figure 5: Line 374-375. Given the repetitive nature of the task, it is unclear whether SWRs are encoding upcoming or past spatial trajectories or whether they are encoding trajectories at all. The authors would need to show that SWRs-ACC communication is predictive of task outcome to claim it is specifically necessary for future outcomes rather than consolidating past trajectories.

      We agree with the reviewer and have made changes to reflect that the ACC-CA1 correlation in Fig.5 is specific to the same side of their selectivity, not exactly to future trajectories. Regarding the repetitive nature of the task (same-side rule), we have specifically addressed the advantage and limitation of this task design in the discussion. Regarding the observer's own past vs. future trajectories, our past publication (Mou et al., 2022) shows that the CA1 replay in SWRs more likely encode the correct, future trajectories. 

      (4) Figure 7. It appears that the correlation was conducted between ACC activity and CA1 replays recorded at distinct time windows (delay period vs. water consumption). It is unclear how ACC activity could influence CA1 replays when they occur hundreds of milliseconds apart or even longer.

      We thank the reviewer for raising this important question. We have shown that the higher same-side ACC activity during observation continues during water consumption. However, our added data in Fig.S5E show that this enhancement did not occur precisely within SWRs. We thus propose a possibility that the overall enhanced activity of same-side ACC cells during water consumption provides an overall, background excitation boost to same-side CA1 cells to enhance their replay within SWRs. We have revised the discussion section to present this model. 

      (5) Abstract: lines 24-25 Discussion: lines 475-476 Based on the data there is no certainty whether ACC biases or coordinates CA1 replays. The data simply shows that they are correlated with one another.

      We have modified those sentences to clarify the non-causal nature of the interaction.

      Reviewer #3 (Recommendations For The Authors):

      Please see below for the list of minor corrections and suggestions:

      (1) Line 136-143: On the data shown in Figure 1D, I recommend using two-way mixed ANOVA with sessions as a within-subjects factor and groups as a between-subjects factor.

      We thank the reviewer for this point. We indeed use two-way ANOVA for those comparisons. We have specified out in the text.

      (2) Line 219-228: I recommend expanding the explanation of two control conditions here. It was written in the method section, but the readers would appreciate the gist of these conditions in the result section. In particular, it was unclear how box SI was calculated in the Empty condition. Also, the plots of poke rates in the control conditions will be useful to show that rats did not learn the correct choice from observation in these control conditions.

      We have added more explanation of the two control conditions in the text. The quantifications of poke rates for Demo and two control conditions (Object, Empty) are provided in our previous publication (Mou et al., 2022).

      (3) Line 610: Please specify the number of three types of sessions each rat underwent and the order of these session types.

      We revise the texts in the Method section and provide the numbers.

      (4) In Figure 2c legend, please specify what the number (e.g., -0.41) indicates.

      Those numbers are selection indices (SIs), as specified in the legend.

    1. Author response:

      We would like to thank editors and reviewers for their time spent on our work, fair assessments and constructive criticism. We plan to address their concerns in the future revision as follows, detailed by topic.

      (1) Limitations of focusing on CDR3β only

      In its current state, our work tested the proposed pipeline of data augmentation for binding prediction on benchmark datasets limited to peptide+CDR3β sequence pairs only. As pointed out by all the reviewers, the TCR-peptide interaction is more complex and involves also other regions of the receptor (such as the CDR3α chain) and the MHC presenting the peptide as well. To investigate how the inclusion of additional information impacts results, we plan to apply our pipeline in a setting where the generative protocol is extended to generate paired α and β. The supervised classifier will then receive a concatenation of α+β chains as inputs. We will compare the performance of this classifier with the one using β chains only, and add this analysis to the revised manuscript.

      (1) Validation of generated sequences and interpretation of the features learned by the generative model

      The reliability of the generative model in augmenting the training set with biologically sensible sequences is a crucial assumption of our approach, and we agree with the reviewers raising this as a main concern. Before stating our strategy to improve the soundness of the method, let us first point out a few aspects already considered in the present manuscript:

      • The test set of the classifier is always composed of real sequences: in this way, an increase in performance due to data augmentation cannot be due to overfitting to synthetic, possibly unrealistic, sequences.

      • The generative protocol is initialized from real sequences, and used to generate sequences not too far from them. In this respect, it could be taken as a way to “regularize” the simplest strategy of data augmentation, random oversampling (taking multiple copies of sequences at random to rebalance the data). This procedure avoids generating “wildly hallucinated” sequences with unreliable models. We will better quantify this statement (see below).

      • The training protocol is tailored to push the generative model towards learning binding features between peptide and CDR3β sequences (and not merely fitting their local statistics separately). For example, in the pan-specific setting, during training of the generative model on peptide+CDR3β sequences, the masked language modeling task is modified to force the model to recover the missing amino acid using only the other sequence context.

      We will better stress these points in the revised manuscript. To further validate the generative protocol in the future revision, we will carry out additional sanity checks on the generated data to confirm that the synthetic sequences remain biologically plausible and comparable to real ones.

      (1) Assessment of the performance of the pan-specific protocol for out-of-distribution data:

      To better clarify how the degradation in performance of a classifier tested on out-of-distribution data is impacted by the dissimilarity between test and training data distribution, we will improve the synthetic analysis currently reported in Table 1, adding confidence intervals for accuracy, quantifying thresholds on the distance for the method to work, providing t-SNE embeddings of in- and out-of distribution data.

      (2) Quantification of the threshold for the number of examples per class in order to train the generative model and obtain a performance increase

      In the paper, we adopted an operative common-sense threshold of at least 100 sequences per class in order to apply our data augmentation pipeline. We will quantify this effect testing this threshold in the revised manuscript, in order to (i) emphasize the limits of this two-step generative protocol in the low-data regime and to (ii) assess if the generative model falls back to a random oversampling strategy (due to strong overfitting) when few data are available for training.

      (3) Motivation for the use of RBMs:

      While RBMs have known limitations, their use in our pipeline (together with the more modern TCR-BERT, that we also test) is mainly motivated by the fact that they provide measurable increases in performance with data augmentation despite their simple 2-layer architecture. We stress that simpler generative (profile) models are unable to show this increase, see Appendix 3. In this respect, the RBM provides a minimal generative model allowing us to augment data successfully, and a lower bound to the increase of performance with respect to more complex architectures trained on more data. We will report this point of view in the text.

      (4) Clarification on the role of lattice proteins as an oversimplified toy model for protein interaction

      We agree with the points raised by Reviewer #2 on the limitations of lattice proteins as a model for protein interaction. Indeed, we used it merely as a toy model for phenomenology, a strategy whose validity has been fairly acknowledged by the reviewer. We will report in the main text all the drastic simplifications and reasons why the reader should take the comparison to real data with great care.

    1. Author response:

      The eLife assessment states that our manuscript is important only as a source of data for others to use in the future. Our methods and analysis of wave 1 follicles were said to be "incomplete" because one of two reviewers claimed we did not prove that 80% of wave 1 oocytes turn over by 5 wk.

      We believe that this assessment is simply wrong because critical supporting data already present in the existing manuscript was not understood by one reviewer. Wave 1 follicular oocyte turnover was said to be unproven and to remain uncertain because evidence of death was based only on a lack of Ddx4 staining. New experiments documenting expression of cell death markers, were said to be needed to show the oocytes died. However, our work was not based on the analysis of sectioned material, but used whole mount 3D reconstruction microscopy of cleared ovary preparations. Oocyte death was determined by the absence of an oocyte in fully reconstructed follicles and its replacement with an empty cavity, not just the absence of antibody staining. We included images and complete 3D reconstruction movies documenting these methods. The paper also documents that the holes frequently still contained zona pellucida remnants indicating the former presence of an oocyte. Moreover, we observed many intermediates of oocyte death- shrunken and deformed oocytes- and deformations of follicle structures due to the presence of the empty cavities. Controls showed that Ddx4 staining in the context of 3D imaging always revealed an obvious giant labeled oocyte in 100% of wave 1 follicles prior to death, and in wave 1.5 and wave 2 follicles at all stages. Thus, our methodology is already fully reliable. The reviewer is correct that the entire program of wave 1 development including their programmed turnover would be interesting to explore further. We already provided a large amount of new gene expression information, and documented the first examples of wave 1-specific gene expression. Further studies are not needed for the major conclusions of the paper and can wait for a follow up study.

      Secondly, the existence of wave 1.5 is not "speculative," as stated by the reviewing editor. We extensively validated and quantified the existence of wave 1.5 primordial follicles following Foxl2-cre activation at E16.5, and analysis at 2 wks in multiple experiments. Additionally, we showed wave 1.5 follicles were present at the medullar/cortex border at 2 wks even after activation of Foxl2-cre at E14.5. Our paper also connected for the first time wave 1.5 follicles to a population of non-growing, "poised" primordial follicles at this identical location near the medulla/cortex boundary by Meinsohn et al. in 2021. These follicles had not started to develop yet, and their ultimate fate was not known. We followed the development of these follicles and determined several differences in wave 1.5 follicle gene expression compared to wave 1. As noted in the assessment, our findings on wave 1.5 are now already being extended to other systems such as primate ovaries (adopting our name "wave 1.5" from our bioRxiv manuscript). The simultaneous claims that our discovery of wave 1.5 exists is speculative, and also that other people are finding wave 1.5 follicles in the species they are studying are logically incompatible.

      Response to reviewer 2:

      Line 239-245: Please note that Zhang et al. 2013 also show that lineage-labeled primordial follicles can be found at the cortex-medulla boundary (see their Figure 1B).

      The single image in the Zheng et al. 2014 paper may or may not show mosaic primordial follicles, but it would not be surprising since the experiment was identical to experiments in the paper. However, that single picture is only meaningful in the context of our subsequent work reported in the current manuscript. There was no mention of these follicles in the text of Zheng et al. 2014, no documentation or quantitation of their numbers, and no discussion or understanding of their significance. The incorrect conclusions of the paper were that wave 1 follicles- meaning rapidly developing follicles in the medulla- give rise to most early offspring. This conclusion reversed the previously accepted (and essentially correct) view that wave 1 follicles did not contribute significantly to fertility.

      "Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles." 

      We showed by lineage marking that only about 25 of about 200 wave 1 follicles survive even to wk 5. This clearly does prove our conclusion that the great majority of wave 1 follicles do not contribute to fertility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      Reviewing Editor Comments:

      We suggest that the authors add power estimates to assess whether the sample size is sufficient, given the strength and variability of the genetic instruments. It would also be helpful to present effect estimates for the tobacco instruments alone, to clarify their independent contribution and improve the interpretation of the joint models. In addition, the role of pleiotropy should be addressed more clearly, including which model is considered primary. Stratified analyses by smoking status are encouraged, as prior studies indicate that BMI-HNC associations may differ between smokers and non-smokers. Finally, the comparison with previous studies should be revised, as most reported null findings without accounting for tobacco instruments. If this study finds an association, it should not be framed as a replication

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we have incorporated them in the revised manuscript. We have edited the text as follows (lines 151-155):“Consequently, we used the total R<sup>2</sup> values to examine the statistical power in our study[42]. However, we acknowledge that the value of post-hoc power calculations is limited, since the statistical power estimated for an observed association is already reflected in the 95% confidence interval presented alongside the point estimate[43].” We have also added supplementary figures 1 and 2.

      We can see that when using the latest HEADSpAcE data we were able to detect BMI-HNC ORs as small as 1.16 with 80% power, while the GAME-ON dataset only permitted the detection of ORs as small as 1.26 using the same BMI instruments (Figure B). We have explained these figures in the results section as follows (lines 257-263): “Using the BMI genetic instruments (total R<sup>2</sup>= 4.8%) and an α of 0.05, we had 80% statistical power to detect an OR as small as 1.16 for HNC risk (Supplementary Figure 1). For WHR (total R<sup>2</sup>= 3.1%) and WC (total R<sup>2</sup>= 4.4%), we could detect odds ratios (ORs) as small as 1.20 and 1.17, respectively. This is an improvement in terms of statistical power compared to the GAME-ON analysis published by Gormley et al.[28], for which there was 80% power to detect an OR as small as 1.26 using the same BMI genetic instruments (Supplementary Figure 2).”

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We have now clarified this in the methods section of the revised manuscript as advised. Lines 165-171:

      “Because the IVW method assumes all genetic variants are valid instruments[44], which is unlikely the case, three pleiotropy-robust two-sample MR methods (i.e., MR-Egger[45], weighted median[46] and weighted mode[47]) were used in sensitivity analyses. When the magnitude and direction of effect estimates are consistent across methods that rely on different assumptions, the main findings are more convincing. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even if they are not equally powered.”

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      We agree that it would be useful to present the univariable MR effect estimates for smoking behaviour and HNC risk along those obtained using multivariable MR. We have now included the univariable MR estimates for both smoking behaviour variables as a note under Supplementary Table 11 and in the manuscript (lines 316-318): “In univariable IVW MR, both CSI and SI were linked to an increased risk of HNC (CSI OR=4.47 per 1-SD higher CSI, 95%CI 3.31–6.03, p<0.001; SI OR=2.07 per 1-SD higher SI 95%CI 1.60–2.68, p<0.001) (Additional File 2: note in Supplementary Table 11).”

      We understand the appeal of conducting stratified MR analyses by smoking status. However, we anticipate such analyses would hinder the interpretation of our findings as they can induce collider bias which could spuriously lead to different effect estimates across strata[12, 13].

      We thank the reviewer/editors for their comment regarding the way we frame of our findings. We have now edited the discussion section to highlight our study results are different to those obtained in studies that do not account for smoking behaviour. Lines 398-401: “With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      Reviewer #1 (Recommendations for the authors):

      The authors do share a table of the percent variance explained of the different genetic instruments, which vary widely, and that table is very welcome because we can get some sense of their utility. The problem is that they don't translate that into a power estimate for the case-control study size that they use. They say that it is the biggest to date, which is good, but without some formal power estimate, it is not particularly reassuring. A framework for MR study power estimates was reported in PMID: 19174578, but that was using very simple MR constructs in use in 2009, and it isn't clear to me if that framework can be used here. That power paper suggests that weak genetic instruments need very large sample sizes, far larger than what is used in the current manuscript. I am unable to estimate the true strength of the instruments used here, and so I am unsure of whether power is an issue or not.

      We have now included power calculations in our manuscript to address the reviewer’s concerns. Nevertheless, as mentioned above, post-hoc power calculations are of limited value, as statistical power is already reflected in the uncertainty around the point estimates (the 95% confidence intervals). Hence, it is important to avoid drawing conclusions regarding the likelihood of true effects or false negatives based on these calculations.

      Although the hypothesis here is that smoking accounts for the apparent BMI association previously reported for HNC, it would have been preferable to see the estimates for their 2 genetic instruments for tobacco alone. The current results only show the BMI instruments alone and then with the tobacco instruments. I would like to see what the risk estimates are for the tobacco instrument alone, so that I can judge for myself what happens in the joint models. As presented, one can only do that for the BMI instruments.

      We thank the reviewer for this comment. The univariable IVW MR estimate of smoking initiation was OR=2.07 (95%CI 1.60 to 2.68, p<0.001), while the one for comprehensive smoking index was OR=4.47 (95%CI 3.31 to 6.03, p<0.001). We have included this information in the manuscript as requested (please see response to reviewing editor above).

      On line 319, they write that "We did not find evidence against bias due to correlated pleiotropy..." I find this difficult to parse, but I think it means that they should believe that correlated pleiotropy remains a problem. So again, they seem to see their primary model as compromised, and so do I. This limitation is again stated by the authors on lines 351-352.

      We apologise if the wording of the sentence was not easy to understand. When using the CAUSE method, we did not find evidence to reject the null hypothesis that the sharing (correlated pleiotropy) model fits the data at least as well as the causal model. In other words, our CAUSE finding and the inconsistencies observed across our other sensitivity analyses led us to believe that our main IVW MR estimate for BMI-HNC was likely biased by correlated pleiotropy. We believe it is important to explore the source of this bias, which is why we used multivariable MR to investigate the direct effect of BMI on HNC risk while accounting for smoking behaviour.

      In the following paragraphs (lines 358-369), the authors state that their findings are consistent with prior reports, but that doesn't seem to be the case if we take their primary BMI instrument as representing the outcome of this manuscript. Here, they find an association between the BMI instrument and HNC risk, but in each of the other papers they present the primary finding was null without the extensive model changes or the aim of accounting for tobacco with another instrument. I don't see that as replication.

      This is a good point. We have now edited the discussion of our manuscript to avoid giving the impression that our findings replicate those from studies that do not account for smoking behaviour in their analyses. We have edited lines 384-401 as follows:

      “Previous MR studies suggest adiposity does not influence HNC risk[27-29]. Gormley et al.[28] did not find a genetically predicted effect of adiposity on combined oral and oropharyngeal cancer when investigating either BMI (OR=0.89 per 1-SD, 95% CI 0.72–1.09, p=0.26), WHR (OR=0.98 per 1-SD, 95% CI 0.74–1.29, p=0.88) or waist circumference (OR=0.73 per 1-SD, 95% CI 0.52–1.02, p=0.07) as risk factors. Similarly, a large two-sample MR study by Vithayathil et al.[29] including 367,561 UK Biobank participants (of which 1,983 were HNC cases) found no link between BMI and HNC risk (OR=0.98 per 1-SD higher BMI, 95% CI 0.93–1.02, p=0.35). Larsson et al.[27] meta-analysed Vithayathil et al.’s[29] findings with results obtained using FinnGen data to increase the sample size even further (N=586,353, including 2,109 cases), but still did not find a genetically predicted effect of BMI on HNC risk (OR=0.96 per 1-SD higher BMI, 95% CI 0.77–1.19, p=0.69). With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      We also deleted part of a sentence in the discussion section, so lines 416-418 now look as follows: “An important strength of our study was that the HEADSpAcE consortium GWAS used had a large sample size which conferred more statistical power to detect effects of adiposity on HNC risk compared to previous MR analyses[27-29].”

      On lines 384-386 they note a strength is that this is the largest study to date, but I would reiterate that larger and more powerful does not equate to adequately powered.

      This is true. We have included power calculations in the manuscript as requested.

      It's well known that different HNC subsites have different etiologies, as they mention on lines 391-392, and it is implicit in their use of data on HPV positive and negative oropharyngeal cancer. They say that they did not find evidence for heterogeneity in this study, but that would only be true for the null BMI instrument. The effect sizes for their smoking instruments are strikingly different between the subsites.

      We agree and are sorry for the confusion we may have caused by the way we worded our findings. We have edited the text to clarify that the lack of subsite heterogeneity only applied to our results for BMI/WHC/WC-HNC risk. Lines 418-424 now read as follows:

      “Furthermore, the availability of data on more HNC subsites, including oropharyngeal cancers by HPV status, allowed us to investigate the relationship between adiposity and HNC risk in more detail than previous MR studies which limited their subsite analyses to oral cavity and overall oropharyngeal cancers[28, 68]. This is relevant because distinct HNC subsites are known to have different aetiologies[69], although we did not find evidence of heterogeneity across subsites in our analyses investigating the genetically predicted effects of BMI, WHR and WC on HNC risk.”

      Finally, the literature on mutational patterns gives us strong reason to believe that HNC caused by tobacco are biologically distinct from tumors not caused by tobacco. The authors report in the introduction that traditional observational studies of BMI and HNC have reported different findings in smokers versus never smokers, so I would assume there is a possibility that the BMI instrument could have different associations with tumors of the tobacco-induced phenotype and tumors with a non-tobacco induced phenotype. I would assume that authors have access to the data on self-reported tobacco use behavior, even if they can't separate these tumors by molecular types. Stratifying their analysis by tobacco users or not might reveal different results with the BMI instrument.

      We appreciate the reviewer’s comment. We agree that it would have been interesting to present stratified analyses by smoking status along our main findings. However, we decided against this because of the risk of inducing collider bias in our MR analyses i.e., where stratifying on smoking status may induce spurious associations between the adiposity instruments and confounding factors. Multivariable MR is considered a better way of investigating the direct effects of an exposure (adiposity) on an outcome (HNC) accounting for a third variable (smoking)[14], which is why we opted for this method instead.

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

      (12) Coscia C, Gill D, Benitez R, Perez T, Malats N, Burgess S: Avoiding collider bias in Mendelian randomization when performing stratified analyses. Eur J Epidemiol 2022, 37(7):671-682.

      (13) Hamilton FW, Hughes DA, Lu T, Kutalik Z, Gkatzionis A, Tilling K, Hartwig FP, Davey Smith G: Non-linear Mendelian randomization: evaluation of effect modification in the residual and doubly-ranked methods with simulated and empirical examples. Eur J Epidemiol 2025.

      (14) Sanderson E, Davey Smith G, Windmeijer F, Bowden J: An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2019, 48(3):713-727.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It has been established that srrm4 mutants exhibit no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022). We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes. In contrast, srrm3 mutants have severe vision loss, early mortality, and numerous neural and behavioral phenotypes (Ciampi et al., 2022; Lopez-Blanch et al., 2024). We now point out the phenotypes of srrm3/srrm4 mutants in the manuscript.

      We chose not to generate and characterize the behavior and brain activity of srrm3/srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm3 and/or srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by Srrm4, making the characterization of srrm3/srrm4 mutants disconnected from our overarching project goal.

      We have edited the Introduction as follows to clarify our goal: “Studies of splicing regulators such as srrm4 impact the entire splicing program, making it impossible to determine the importance of individual microexons to protein function. Further, microexons could still be differentially included in a regulatory mutant via compensation by other splicing factors ...”

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We are unclear whether “microexon splicing genes” refers to the splicing regulators srrm3/srrm4, which we choose not to study in this work (see response to point #1 above), or the genes that contain microexons. The severe visual phenotypes of srrm3 mutants confounds the study of microexon splicing in this line because altered splicing levels could be due to downstream changes in this significantly different developmental context. A detailed discussion of splicing consequences on removal of microexons from microexoncontaining genes is in the response to point #4 below.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      In the first version of the manuscript, we had included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 (now Figures S3 and S4) of this manuscript. Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) described mild behavioral phenotypes for a microexon removal for kif1b, and we showed developmental abnormalities for the kif1b loss-of-function allele (now Figure S3). We have now added a predicted protein-truncating allele for ppp6r3. This new line has phenotypes that are similar but slightly stronger in brain activity and structure than the mutant that lacks only the microexon. The prior Figure S1 (now Figures S3 and S4) was only briefly mentioned in the first version of the manuscript, and we now clarify this point in the Results: “Protein-truncating mutations in eleven additional genes that contain microexons revealed developmental and neural phenotypes in zebrafish (Figure S3, Figure S4), indicating that the genes themselves are involved in biologically relevant pathways. Three of these genes– tenm4, sptan1, and ppp6r3 – are also in our microexon line collection.”

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We have modified our manuscript to include a discussion of both loss-of-function zebrafish and mouse mutants. See the response to below point #4.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      We acknowledge that unexpected changes to the mRNA of the tested mutants could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we have clarified our generated mutant alleles by adding a figure (Figure S1) that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we have experimentally determined whether the mRNA was modified as expected for a subset of mutants with phenotypes. In all eight tested lines (Figure S2), the microexon was precisely eliminated without causing any other effects on the sequence of the transcript in the neighboring region. We did, however, observe an effect on transcript abundance for one homozygous mutant (vav2). It is possible that complex forms of genetic regulation are occurring that are not induced by unexpected isoforms or premature stop codons. Interestingly, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) eliminated a different microexon in vav2 and also observed a subtle well center preference. If their allele from an entirely different intronic region also results in transcript downregulation, it would support the hypothesis of genetic compensation through atypical pathways. If not, it is likely this phenotype is due specifically to removal of the microexon protein sequence. Not all mutants with phenotypes could be assessed with qRT-PCR because some were no longer present in the lab. All lines were generated in a similar way, however, removing both the microexon and neighboring regulatory elements while avoiding the neighboring exons. Accordingly, we now also explicitly point out those where the clean loss of the microexon was confirmed (eif4g3b, ppp6r3, sptan1, vti1a, meaf6, nrxn1a, tenm3) and those with possibly interesting phenotypes that were not confirmed (ptprd-1, ptprd-2, rapgef2, dctn4, dop1a, mapk8ip3).

      Third, we have further emphasized in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous (Thyme et al., 2019; Capps et al., 2024) and unpublished ongoing work. We showed data for one mutant, tcf7l2, which we consider to have moderately strong neural phenotypes, and we have extended this comparison in the revision (new Figure 3G). Additionally, loss-of-function alleles for some microexoncontaining genes have strong developmental phenotypes, as we showed in Figure S1 (now Figures S3 and S4) of this manuscript in addition to our published work (Thyme et al., 2019; Capps et al., 2024). It is known from the literature that the loss-of-function mutants for mapk8ip3 are stronger than we observed here (Tuttle., et al., 2019), suggesting that only the microexon is removed in our line. The microexons in Ptprd are also well-studied in mice, and we expect that only the microexon was removed in our lines. Both Dctn4 and Rapgef2 are completely lethal prior to weaning in mice (the International Mouse Phenotyping Consortium).

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by Srrm4. We have clarified the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as Srrm3 (Ciampi et al., 2022). It is wellknown that loss of srrm3 has a stronger impact on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators but instead to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 4.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to points 3 and 4.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details). Our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by LopezBlanch et al. (https://doi.org/10.1101/2024.10.23.619860). We have also already compared the microexon removal to a loss-of-function mutant for two lines (Figures S3 and S4), and we have made this comparison more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 response to reviewer 1).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a proteintruncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      We now address the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements. We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details).

      Reviewer #1 (Recommendations for the authors):

      (1) For most ME mutations, 4 guide sequences are provided. More description / a diagram could be helpful to interpret how ME mutations were generated.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We have also added the following point to the text: “Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1).”

      (2) Figure 1 indicates that there are 45 microexons (MEs) but the text initially indicates that there are 44 that exist in a canonical layout (the text later indicates there are 45). This could be made more clear.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons  (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure 1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      (3) The description of the "canonical layout" as containing TGC / UC repeats could be rewritten as either "containing a UGC motif and UC repeats" or "containing a TGC motif and TC repeats."

      This error has been corrected.

      (4) Why was tcf7l2 selected as a control for MAP mapping?

      The mutant for tcf7l2 is an example of a moderately strong phenotype from a recent study we completed (Capps et al., 2025). This mutant was selected because it has both increased and decreased activity and structure and is ideal for setting the range of the graph. We now include a comparison to additional mutants from this study of autism genes (Capps et al., 2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). We also include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (5) What does it mean that of the remaining microexons, most are similar to canonical layout?

      Typically, they would have one of the two regulatory elements instead of both or the location of the possible elements would be slightly farther away than expected. We have clarified this point in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat  or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.”

      (6) Figure 2A is very difficult to see - most are either up or down - suggest splitting into 2 figures - one = heat map, second can summarize values that were both up and down.

      We prefer to retain this information for accuracy. The bubble location is offset to effectively share the box between the orange (decreased) and purple (increased) measures. For example, and as noted in the methods and now expanded upon, a measure can change between 4 and 6 dpf or a measure such as bout velocity could be increased while the distance traveled is decreased (both are magnitude measures). The offset of the bubbles is consistently 0.2 data units in x and y from the center of the box.

      (7) The authors apply rigorous approaches to testing the importance of microexons. I especially appreciate the inclusion of separate biological replicates in the main figures!

      We thank the reviewer for their positive feedback.

      (8) Page 5 line 5 - suggest "compared to homozygous mutants".

      The change has been made.

      (9) For Eif5g3b dark flash phenotype, it's not clear what "p-values are not calculated for response plots" means. A p-value is provided in the plot for ppp6r3 response freq.

      The eif4g3b plot is the actual response trace measuring through pixel changes whereas the ppp6r3 is the frequency of response. While informative, the response plot is time-based data with a wide dynamic range, making the average signal across the entire time window meaningless. We include the p-values for a related measure, the latency for the first 10 dark flashes in block 1 (day6dpfdf1a_responselatency) in the legend.

      (10) The ptprd phenotype in 2D is not described in the text.

      The change has been made.

      (11) Page 7 line 7: "mild" is repeated.

      This error has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Specific points for needed improvement:

      (1) The title should be adjusted to more accurately describe the results. The term 'minimal' is under-representing the findings. 9/45 (20%) of targets in their screen have some phenotype, indicating that a significant number have indeed an important function. Moreover, the phenotypic analysis is limited, leaving space for missed abnormalities (as discussed by the authors). I would therefore suggest a more neutral title such as 'Systematic genetic deletion of microexons uncovers their roles in zebrafish brain development and larval behaviour'.

      While some microexon mutants do have repeatable phenotypes, these phenotypes are far milder than phenotypes observed in other mutant sets. We now include a comparison to additional mutants from this study of autism genes (Capps et al.,2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). The title states that these microexons have a minimal impact on larval zebrafish brain morphology and function, leaving room for the possibility of adult phenotypes. Thus, we prefer to retain this title.

      (2) Do the 45 chosen microexons correspond to the 44 with a canonical layout with TGC and UC repeats? If so, it needs to be explicitly stated in the text that exons were chosen for mutation based on the potential for SRRM4 regulation. If not, then the rationale for the choice of the 45 mutants from the 95 highly conserved events needs to be explained further.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure   1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      There was no clear rationale for those that were selected. We attempted to generate all 95 and some mutants were not successfully generated in our initial attempt. As we found minimal phenotypes, we elected to not continue to make the remaining ones on the list.

      (3) More detail regarding the design of guides for CRISPR is required in the text in the methods section. From Table S2, 4 guides were used per microexon. Were these designed to flank the microexon? How far into the intronic sequence were the guides designed? Were the splicing regulatory sequences (polypyrimidine tract, branchpoint) also removed? The flanking sequences of each of the 45 deletion lines need to be provided.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Following on from the previous point, to ascertain that the phenotype observed is truly due to lack of microexon (rather than other event linked to removed intronic sequences) - for the 7 exons newly identified as functionally important, at least one added deletion line has to be shown, presenting the same phenotype. If making 7 more lines can't be achieved in a reasonable time (we are aware this is a big ask), a MO experiment blocking microexon splicing needs to be provided (may not be ideal for analysis at 6 dpf). For the existing mutants and the new ones (or morphants), sequencing of the mRNAs for the 7 genes in mutants and siblings also needs to be added to check any possible change in other variants.

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA. We acknowledge that we inadequately described the generation of these alleles, and we now provide Figure S1 to show the microexon’s relationship to possible regulatory elements that impact splicing in unexpected ways if they remain.

      We now acknowledge the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Given the caveats of MOs and transient microinjection for the study of 6 dpf phenotypes, we disagree that this suggested experiment would provide value. The phenotypic assays we use are highly sensitive, and we would not even trust CRISPANTs to yield reliable data. We have added an additional loss-of-function allele for ppp6r3 from the Sanger knockout project, which has a similar but stronger size change to the ppp6r3 microexon-removal line. In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      To support that these we generated clean removal of these microexons, we experimentally determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point #4 public response to Reviewer 1). We also have already compared the microexon removal to a loss-offunction mutant for two lines (Figure S1), and we have made that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 public response to Reviewer 1).

      (5) Figure 3: An image of control tcf7l2 mutant brain activity as a reference should be included.

      We now include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (6) Figure 3a/b. The gene names on the y-axis of the pERK and structure comparisons should be reordered to be alphabetical so that phenotypes can be compared by the reader for the same microexon across the two assays.

      These data are clustered so that any similarities between maps can be recognized. We prefer to retain the clustering to compare lines to each other.

      (7) Figure S6 legend. Including graph titles like "day3msdf_dpix_numberofbouts_60" is not comprehensible to the reader so should be replaced with more descriptive text. As should jargon such as "combo plot" and"habituation_day5dpfhab1post_responsefrequency_1_a1f1000d5p" etc.

      The legend has been edited to describe the experiments. Subsections of the prior names are maintained in parentheses to enable the reader to connect the plots in this figure to the specific image and underlying data in Zenodo.

      (8) Page 2 line 21 "to enable proper".

      The change has been made.

      (9) Page 7 line 7. Repeatable phenotypes were mild mild.

      This error has been corrected.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1B is confusingly laid out.

      We are unclear how to modify Figure 1B, as it is a bar plot. We have modified several figures to improve clarity.

      (2) Figure 1E-there are some pictures of zebrafish but to what end? They aren't labelled. The dark "no expression" looks really similar to the dark green, "high expression".

      The zebrafish images represent the ages assessed for microexon inclusion. We have added labels to clarify this point.

      (3) The main text says "microexons were removed by Crispr" but there is no detail in the main text about this at all-- and barely any in the methods. What does it mean to be removed? Cleanly? Or including part of the introns on either side? Etc. How selected, raised, etc? I can glean some of this from the Table S2 if I do a lot of extra work, but at least some notes about this would be important.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Figure 2 - There are no Ns, at least for the plots on the right. The reader shouldn't have to dig deep in Table S2 to find that. It is also unclear why heterozygous fish are not included in these analyses, since there are sibling data for all. Removed for readability of the plots might be warranted, but this should be made explicitly clear.

      The Ns for these plots have been added to the legend. The legend was also modified as follows: “Comparisons to the heterozygous larvae are removed for clarity and available in the Supplementary Materials, as they often have even milder phenotypes than homozygous.”

      (5) Needed data: for those with phenotypes, some evidence should be presented that the full-length transcripts that encode proteins without the microexons are still expressed at the same level and without splicing errors/NMD. Otherwise, some of these phenotypes that were found could be due to knockdown or LOF (or I suppose even overexpression) of the targeted gene.

      We have added a new Supplementary Figure S2 confirming clean removal of the microexons with RT-PCR for a subset of mutants with phenotypes. This figure also includes qRT-PCR for the same subset. We now discuss these findings: Results: “For eight mutant lines, we confirmed that the microexon was eliminated from the transcripts as expected (Figure S2). Although our genomic deletion did not yield unexpected isoforms, qRT-PCR on these eight lines revealed significant downregulation for the homozygous vav2 mutant (Figure S2), indicating possibly complex genetic regulation.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public review)

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions. 

      (1) First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence. 

      Although the reviewer #1 commented that “The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence”, we believe that this comment may be unfair. 

      It may be unfair for the reviewer #1 to neglect our responses to the original reviewer comments regarding the direct measurement of cytosolic NA levels. It is true that none of the recommended methods to directly measure cytosolic NA levels are not feasible as described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Recommendations for the authors (2)). To measure extracellular NA with GRAB-NE photometry, α2A-ARs must be expressed in the cell membrane. GRAB-NE photometry is not applicable unless α2A-ARs are expressed, whereas increases in cytosolic NA levels are caused by internalization of α2A-ARs in our study.

      In our study, we elaborated to detect the change in MAO-A protein with Western blot method, instead of examining MAO-A enzymatic activity. Because the relative quantification of active AEP and Tau N368 proteins by Western blot analysis should accurately reflect the change in the MAO-A enzymatic activity, enzymatic assay may not be necessarily required while we admit the necessity of enzymatic assay to better demonstrate the MAO-A activities as discussed in the previously revised manuscript (R1, page 10, lines 314-315). 

      We used the phrase “beyond the scope of the current study” for “the mechanism how Ca<sup>2+</sup> activates MAO-A” as described in the original authors’ responses (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (3)). We do not think that this mechanism must be investigated in the present study because the Ca<sup>2+</sup> dependent nature of MAO-A activity is already known (Cao et al., 2007). 

      On the other hand, because it is not possible to measure cytosolic NA levels with currently available methods, the quantification of the connection between α2A-AR internalization and increased cytosolic NA levels must be considered outside the scope of the study. However, our study demonstrated the qualitative relationship between α2A-AR internalization and active-AEP/TauN-368 reflecting increased cytosolic NA levels, leaving “a small gap in the mechanistic chain of evidence.” Therefore, it may be unreasonable to criticize our study as “leaving a significant gap in the mechanistic chain of evidence” with the phrase “beyond the scope of the current study.” 

      (2) Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      As described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (4)), we had already done another behavioral test using elevated plus maze (EPM) test. By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests showed that chronic RS mice displayed both anxiety-like and memory impairment-like behaviors. Accordingly, we have softened the implication of anxiety and memory impairment (page 13, lines 396-399) and revised the abstract (page 2, line 59) in the revised manuscript (R2).  

      (3) Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Because the quantification of MAO-A expression can be performed with greater accuracy by means of Western blot than by immunohistochemistry, we have moved the immunohistochemical results (shown in Figure 5) to the supplemental data (Figure S8) following the suggestion made by the Reviewer #3. As the relative quantification of active AEP and Tau N368 proteins by Western blot analysis may accurately reflect changes in the MAO-A enzymatic activity which is consistent with the result of Western blot analysis of MAO-A, enzymatic assay or re-staining of immunofluorescence for MAO-A may not be necessarily required. We do not think that a new experiment of Western blot analysis is necessary to re-evaluate MAO-A just because of the lack of the less-reliable quantification of immunohistochemical staining.

      (4) Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      The reviewer #3 is misunderstanding Figure S7. In Figure S7, there are two types of α2A-AR expressing neurons; one is TH-positive LC neuron and the other is TH-negative neuron in mesencephalic trigeminal nucleus (MTN). This clearly indicates that TH staining is specific. Furthermore, α2A-AR staining was much more extensive in MTN neurons than in LC neurons. Thus, α2A-AR signal is not similar to TH signal and there are no labeling errors, which is also evident in the merged image (Figure S7C).

      (5) Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

      Overall, the reviewer #1 was not satisfied with our revision regardless of the authors’ responses. As detailed above in our responses to the replies (1)~(4), we believe that in the original authors’ responses and in the above-described responses we effectively responded to the criticisms by the reviewer #1.

      Reviewer #2 (Public review): 

      Comments on revisions: 

      The authors have addressed all of the reviewers' comments.

      We appreciate constructive and helpful comments made by the reviewer #2.

      Reviewer #3 (Public review): 

      Weaknesses:  

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway  

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study. 

      Authors response: It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      The core idea behind my comment, as well as that of Reviewer 1, was to encourage integrating your individual findings into a more cohesive in vivo experiment. Using GRAB-NE to measure extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately, cytosolic NA levels. Connecting these experiments would significantly strengthen the manuscript and enhance its overall impact. 

      It may be true that the measurement of extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately cytosolic NA levels. However, the reviewer #3 is still misunderstanding the applicability of GRAB-NE method to detect NE in our study. As described in the original authors’ response, there appeared to be no fluorescence probe to label cytosolic NA at present. Especially, the GRAB-NE method recommended by the reviewers #1 and #3 is limited to detect NA only when α2A-AR is expressed in the cell membrane.Therefore, when increases in cytosolic NA levels are caused by internalization of α2A-ARs, NA measurement with GRAB-NE photometry is not applicable.

      (2) Pharmacology and NE concentration  

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects. 

      Authors response: It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      While the milestone papers by Williams remain highly influential, they should be re-evaluated in light of more recent findings, given that they date back over 40 years. Advances in our understanding now allow for a more nuanced interpretation of some of their results. For example, see McKinney et al. (eLife, 2023). This study demonstrates that presynaptic β-adrenergic receptors-particularly β2-can enhance neuronal excitability via autocrine mechanisms. This suggests that your post-activation experiments using atipamezole may not fully exclude a contribution of β-adrenergic signaling. Such a role might become apparent when conducting more detailed titration experiments.

      The reviewer #3 may be misunderstanding the report by McKinney et al. (eLife, 2013). This paper did not demonstrate that presynaptic β-adrenergic receptors-particularly β2- can enhance neuronal excitability via autocrine mechanisms. It is impossible for LC neurons to increase their excitability by activating β-adrenergic receptors, as we have clearly shown that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole. Considering the difference in Ki values of atipamezole for α2-AR (= 2~4 nM) (Vacher et al., 2010, J Med Chem) and β-AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), such a complete antagonization (of 100 µM NA-induced GIRK-I) by 10 µM atipamezole really reflect α2A-AR activity but not β-AR activity (Figure S5). Furthermore, it is already well established that NA-induced GIRK-I was mediated by α2-AR activity in LC neurons (Arima et al., 1998, J Physiol). McKinney et al. (eLife, 2023) have just found the absence of lateral inhibition on adjacent LC neurons by NA autocrine caused respective spike activity. This has nothing to do with autoinhibition.

      (4) Age mismatch and disease claims 

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope. 

      Authors response: As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (R1, page 14, lines 448-450).

      It would be great to see this experiment performed in aged mice-you are the one who has everything in place to do it right now! 

      In our future separate studies, we would like to prove that the present mechanism is valid in aged mice, to validate its involvement in the pathogenesis of AD. This is partly because the patch-clamp study in aged mice is extremely difficult and takes much time.

      Authors response: In the abstract, you suggest that internalization of α2A-adrenergic receptors could represent a therapeutic target for Alzheimer's disease. "...Thus, it is likely that internalization of α2A-AR increased cytosolic NA, as reflected in AEP increases, by facilitating reuptake of autocrine-released NA. The suppression of α2A-AR internalization may have a translational potential for AD treatment."

      α2A-AR internalization was involved in the degeneration of LC neurons. Because we confirmed that spike-frequency adaptation reflecting α2A-AR-mediated autoinhibition can be induced in adult mice as prominently as in juvenile mice (Figure S10), it is not inadequate to suggest that the suppression of α2A-AR internalization may have a translational potential for anxiety/AD treatment (see Discussion; R2, page 14, lines 445-449).

      (6) Quantitative histology  

      Figure 5 presents attractive images, but no numerical analysis is provided. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots. 

      Author response: We have moved the immunohistochemical results in Fig. 5 to the supplement, as we believe the quantification of immunohistochemical staining is not necessarily correct.   

      What do you mean by that " ...immunohistochemical staining is not necessarily correct."  

      It is evident that in terms of quantification, Western blot analysis is a more accurate method than immunohistochemical staining. In this sense, it is the contention of our study that the ROI-based fluorescence quantification of immunohistochemical staining is not necessarily an accurate or correct procedure, compared to the quantification by Western blot analysis.

    1. Author response:

      We thank the two reviewers for their constructive criticisms which we will address in the coming weeks, and we are confident doing so will benefit the manuscript.

      We will aim to address all comments, but there are two main areas in particular that we highlight here:

      (1)  Both reviewers make important suggestions to improve the readers’ understanding of the anatomical complexities and raw files we provide. We will generate annotated confocal stacks and simplify the nomenclature to better guide the reader through the more complex details of the anatomy of the central complex, and the neuron types we characterized more closely.

      (2)  Both reviewers also pointed to several parts of our interpretations and discussion that should be clarified. We will do so by improving the language we use at certain sections to offer more precision, and by offering alternative explanations where possible.

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial. 

      Strengths: 

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible. 

      Weaknesses: 

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported. 

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent. 

      We agree and will make the overlap explicit, acknowledging priority and clarifying what is new here. The initial version of the preprint of Zhang et al. (2022) (https://www.biorxiv.org/content/10.1101/2022.03.16.484641v1) lacked the derivation (it referenced a Supplementary Note not yet available); it was added during the eLife submission. We worked from the preprint and missed this update, which we will now correct.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima. 

      We respectfully disagree with the reviewer on the physical plausibility of these scenarios there is both concrete experimental and theoretical evidence for the scenarios we discussed.

      Experimental: Strom et al. (2017) (our reference 11) describes a substantially reduced protein diffusion coefficient at an in vivo phase boundary, while Hahn et al. (2011a) and Hahn et al. (2011b) (our references 27 and 28) describe transient accumulation of molecules at a phase boundary, which they attribute to the Donnan potential, but conceivably a lowered mobility could play a role.

      Theoretical: Recent work (e.g., Majee et al. (2024)) shows that charged layers could form at phase boundaries, which could either repel or attract incoming molecules, depending on their charge, thus altering the local volume fraction, resulting in a trough or peak. Arguably, the model put forth by Zhang et al. (2024) could be mapped to a potential wall, where particles are reflected, unless in a certain state. We will add sentences to the corresponding results section, as well as the discussion to make this plausibility more apparent.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2. 

      We believe we will be able to address both issues.

      Reviewer #2 (Public review): 

      Summary: 

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies. 

      Strengths: 

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits. 

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios. 

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques. 

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems. 

      Weaknesses: 

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data. 

      We agree with the reviewer, an experimental manuscript is in progress.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      We thank the reviewer for this comment. We will add several such scenarios to the discussion, including the possibility to use interface resistance as a way of ordering biochemical reactions in time, as well as their potential to exclude molecules from condensates for long time periods, which, while not effective in the long-time limit, could help on cellular timescales of minutes to hours to respond to transient events.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability. 

      The treatment of labelled and unlabelled molecules as physically identical is well supported by our experiments. Droplets under typical experimental conditions, i.e. when bleaching is not too strong, do not markedly change size or volume fraction of molecules, which would be expected if the physical properties like molecular volume or interaction strength were significantly changed. However, we do agree that in more extreme bleaching regimes the bleach step itself will change the droplet properties, but this can be avoided by tuning the FRAP laser power and dwell times accordingly.

      Our diffusivity profiles are chosen in the simplest possible way to handle typical experimental constraints (large D outside, lower D inside, potentially lowered D at the boundary) and allow for a mean-field treatment. To the best of our knowledge, the precise make-up and concentration profiles of phase boundaries in biomolecular condensates are not currently known, due to limitations in optical resolution.

      Reviewer #3 (Public review): 

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance. 

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended. 

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics). 

      We thank the reviewer for this comment and will discuss possible mechanisms and how they map to our meanfield model in more detail, both in the corresponding results section, and in the discussion, as also outlined in our response to Reviewer #1.

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n<sup>^</sup>{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n<sup>^</sup>{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m<sup>^</sup>2). 

      We thank the reviewer. We identified that the error originates in the inline definition of the exchange chemical potential, already before equation 11. We inadvertently dropped a prefactor of n, which then shows up in the following equation as an exponent to (1-phi<sup>^</sup>*). Very importantly this means the main result eq. 25 still holds, and in the revised manuscript we will correct the ensuing typographical mistakes.

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      We will substantially expand the Appendix for the numerical solutions and add an explanatory file to the repository to make clear how the code can be run, as well as its dependencies.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability. 

      We have shown in Bo et al. (2021) that the labelling approach can be carried over to multi-component systems. Each species may, for example, encounter its own interface resistance. We will discuss this in more detail in the revised manuscript.

    1. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewers’ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2’s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscript’s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: “Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r ≈ 0.4) [15–17, 28, 61, 67, 68].”), the medium effect size we observed (r ≈ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krämer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a “mega-analysis”. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283–308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoC’s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: “Table 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanning”

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) ≈ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) ≈ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: “To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, age×sex, and age2×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      Line 445: “Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (“All MRI Stacked”) explained 72% of this age and sex-related variance (Fig. 8i–l and Table S21).”

      Discussion

      Line 660: “We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.”

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: “Composite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (“Extremely unhappy”) and the higher score indicated a higher level of happiness (“Extremely happy”). For all questions, we assigned the median values to “Prefer not to answer” (-818 for in-person assessment and -3 for online questionnaire) and “Do not know” (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the “Work/job satisfaction” question from the mental health derivatives list because it included a “Not employed” response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (“Frequency of drinking alcohol”), we replaced missing values with 0 as they would correspond to a “Never” response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of ≥ 8 and ≥ 15, respectively. The “Alcohol dependence ever” status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].”

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816–832.  

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscript’s Discussion.

      Results

      Line 298: “On average, information about mental health predicted the g-factor at  R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.”

      Discussion

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (“… these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathways”) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: “The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117–119], the empirical evidence on their associations with cognitive performance is controversial [114, 120–126].”

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimer’s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757–2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. It’s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance – as measured by various cognitive tasks – should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= –0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32].  

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, “why is it then necessary to explain this overlap with neuroimaging measures”, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association – insights that cannot be obtained from behavioural data alone.

      In response to the related question, “Could the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?”, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction – to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Introduction

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      Discussion

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (“marker”) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator – its capacity to capture individual differences in cognition – is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the “Prospective Memory” test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures – numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the “Core measures” column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: “We used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].”

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correct… It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: “Mental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSR’s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variables”

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8–23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164–177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smaller… What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain – mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearson’s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1) Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3) Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: “To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r ≈ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520–5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.”

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2–S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: “To determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, age×sex, and age<sup>2</sup>×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      dwMRI

      Line 331: “Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).”

      rsMRI

      Line 353: “Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)”

      sMRI

      Line 373: “FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).”

      Discussion

      dwMRI

      Line 562: “Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.”

      rsMRI

      Line 583: “We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127–130].”

      sMRI

      Line 608: “Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138–140].”

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Evidence, reproducibility and clarity

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      Thanks for your suggestion to avoid confusion. We used the phrase "nodal spacing" instead of "nodal distribution" throughout the revised manuscript.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      As a key distinction, our study focuses specifically on the main trunk of the contralateral projection of NM axons. This projection features a sequential branching structure known as the delay line, where collateral branches form terminal arbors and connect to the ventral dendritic layer of NL neurons. This structural organization plays a critical role in influencing the dynamic range of ITD detection by regulating conduction delays along the NM axon trunk.

      The study by Seidl et al. (2010) is a pioneering work that measured diameter of NM axon using electron microscopy, providing highly reliable data. However, due to the technical  limitations of electron microscopy, which does not allow for the continuous tracing of individual axons, it is not entirely clear whether the axons measured in the ventral NL region correspond to terminal arbors of collateral branches or the main trunk of NM axons (see Figure 9E, F in their paper). Instead, they categorized axon diameters based on their distance from NL cell layer, showing that axon diameter increases distally (see Figure 9G in their paper). Notably, the diameters of ventral axons located more than 120 μm away from the NL cell layer is almost identical to those in the midline.

      As illustrated in our Figure 4D and Supplementary Video 2, the main trunk of the contralateral NM projection is predominantly located in these distal regions. Therefore, our findings complement those of Seidl et al. (2010) rather than contradicting them. We made this point as clear as possible in text (page 7, line 3).

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      In this study, we examined chick embryos from E9 to just before hatching (E21) and post-hatch chicks up to P9. Chickens begin to perceive sound around E12 and possess sound localization abilities at the time of hatching (Grier et al., 1967) (added to page 4, line 9). Therefore, by E21, the sound localization circuit is largely established.

      On the other hand, additional refinement of the circuit with aging is certainly possible. A key cue for sound localization, interaural time difference (ITD), depends on the distance between the two ears, which increases as the animal grows. As shown in Figure 2G, internodal length increased by approximately 20% between E18 and P9 while maintaining regional differences. Given that NM axons are nearly fully myelinated by E21 (Figure 4D, 6C), this suggests that myelin extends in proportion to the overall growth of the head and brain volume. We described this possibility in text (page 5, line 21)

      Thus, our study covers not only the early stages of myelination but also the post-functional maturation in the sound localization circuit.

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?

      In this study, we demonstrated that although vesicular release did not affect internodal length, it selectively promoted oligodendrogenesis, thereby supporting the full myelination and hence the pattern of nodal spacing along the NM axons. We believe that this finding falls within the broader scope of 'activity-dependent plasticity' involving oligodendrocytes and nodes.

      As summarized in the excellent review by Bonetto et al. (2021), activity-dependent plasticity in oligodendrocytes encompasses a wide range of phenomena, not limited to changes in internodal length but also including oligodendrogenesis. Moreover, the effects of neuronal activity are not uniform but likely depend on the diversity of both neurons and oligodendrocytes. For example, in the mouse visual cortex, activity-dependent myelination occurs in interneurons but not in excitatory neurons (Yang et al., 2020). Additionally, expression of TeNT in axons affected myelination heterogeneously in zebrafish; some axons were impaired in myelination and the others were not affected at all (Koudelka et al., 2016). In the mouse corpus callosum, neuronal activity influences oligodendrogenesis, which in turn facilitates adaptive myelination (Gibson et al., 2014).

      Thus, rather than refuting the role of activity-dependent plasticity in nodal spacing, our findings emphasize the diversity of underlying regulatory mechanisms. We described these explicitly in text (page 10, line 18).

      Significance

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

      This paper does not argue against node plasticity, but rather demonstrates that oligodendrocytes in the NL region exhibit a form of plasticity; they proliferate in response to vesicular release from NM axons, yet do not undergo morphological changes, ensuring adequate oligodendrocyte density for the full myelination of the auditory circuit. Thus, activity-dependent plasticity involving oligodendrocytes would contributes in various ways to each neural circuit, which is presumably attributed to the fact that myelination is driven by complex multicellular interactions between diverse axons and oligodendrocytes. Oligodendrocytes are known to exhibit heterogeneity in morphology, function, responsiveness, and gene profiles (Foerster et al., 2019; Sherafat et al., 2021; Osanai et al., 2022; Valihrach et al., 2022), but functional significance of this heterogeneity remains largely unclear. This paper also provides insight into how oligodendrocyte heterogeneity may contribute to the fine-tuning of neural circuit function, adding further value to our findings. Importantly, our study covers the wide range of development in the sound localization circuit, from the pre-myelination (E9) to the postfunctional maturation (P9), revealing how the nodal spacing pattern along the axon in this circuit emerges and matures.

      Reviewer #2:

      Evidence, reproducibility and clarity

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major points, detailed below, need to be addressed to overcome some limitations of the study.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      eTeNT is a widely used genetically encoded silencing tool and constructs similar to the one used in this study have been successfully applied in primates and rodents to suppress target behaviors via genetic dissection of specific pathways (Kinoshita et al., 2012; Sooksawate et al., 2013). However, precisely quantifying the extent of vesicular release inhibition from NM axons in the brainstem auditory circuit is technically problematic.

      One major limitation is that while A3V efficiently infects NM neurons, its transduction efficiency does not reach 100%. In electrophysiological evaluations, NL neurons receive inputs from multiple NM axons, meaning that responses may still include input from uninfected axons. Additionally, failure to evoke synaptic responses could either indicate successful silencing or failure to stimulate NM axons, making a clear distinction difficult. Furthermore, unlike in motor circuits, we cannot assess the effect of silencing by observing behavioral outputs.

      Thus, we instead opted to quantify the precise expression efficiency of GFP-tagged eTeNT in the cell bodies of NM neurons. The proportion of NM neurons expressing GFP-tagged eTeNT was 89.7 ± 1.6% (N = 6 chicks), which is consistent with previous reports evaluating A3V transduction efficiency in the brainstem auditory circuit (Matsui et al., 2012). These results strongly suggest that synaptic transmission from NM axons was globally silenced by eTeNT at the NL region. We described these explicitly in text (page 8, line 2).

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      Figure 6D depicts a representative axon selected from a dense population of GFP-positive axons in a 200-μm-thick slice after A3V-eTeNT infection to bilateral NM. As shown in Supplementary Video 1 and 2, densely labeled GFP-positive axons can be traced along the main trunk. To prevent any misinterpretation, we have revised the description of Figure 6 in the main text and Figure legend (page 31, line 9), and stated the A3V-eTeNT infection efficiency was 89.7 ± 1.6% in NM neurons, as mentioned above. Based on this efficiency, we interpreted that the global occlusion of vesicular release from most of the NM axons altered the pericellular microenvironment of the NL region, which led to the regional effect on the oligodendrocyte density.

      On the other hand, your question regarding whether sparse expression of eTeNT still has an effect is highly relevant. As we also discussed in our reply to comment 4 by Reviewer #1, the relationship between neuronal activity and oligodendrocytes is highly diverse. In some types of axons, vesicular release is essential for normal myelination, and this process was disrupted by TeNT (Koudelka et al., 2016), suggesting that direct interaction with oligodendrocytes via vesicle release may actively promote myelination in these types of axons.

      To clarify whether the phenotype observed in Figure 6 arises from changes in the pericellular microenvironment at the NL region or from the direct suppression of axon-oligodendrocyte interactions, we included a new Supplementary Figure (Figure 6—figure supplement 1). In this figure, we evaluated the node formation on the axon sparsely expressing eTeNT by electroporation into the unilateral NM. The results showed that sparse eTeNT expression did not increase the percentages of heminodes or unmyelinated segments. This finding supports our conclusion that the increased unmyelinated segments by A3V-eTeNT resulted from impaired synaptic transmission at NM terminals and subsequent alterations of  pericellular microenvironment at the NL region.

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:

      Thank you for your valuable suggestions to improve the rigor of our statistical analyses. We have reanalyzed all statistical tests using R software. In the revised Methods section and Figure Legends, we have clarified the rationale for selecting each statistical test, specified which test was used for each figure, and explicitly defined both n and N. After reevaluation with the Shapiro-Wilk test, we adjusted some analyses to non-parametric tests where appropriate. However, these adjustments did not alter the statistical significance of our results compared to the original analyses.

      (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or MannWhitney U test, which are much more common in the field. What is the rationale for the test choice?

      We have revised the explanation of our statistical test choices to provide greater clarity and precision. For example, in Figure 2G, we first assessed the normality of the data in each of the four groups using the Shapiro-Wilk test, which revealed that some datasets did not follow a normal distribution. Given this, we selected the Kruskal-Wallis test, a commonly used non-parametric test for comparisons across three or more groups. Since the Kruskal-Wallis test indicated a significant difference, we conducted a post hoc Steel-Dwass test to determine which specific group comparisons were statistically significant.

      (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a ttest used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.

      In the case of Figures 3H-K, we compared oligodendrocyte morphology between regions. However, since the number of sparsely labeled oligodendrocytes differs both between regions and across individual samples, there is no strict correspondence between paired measurements. On the other hand, in Figures 5B, C, and E, we compared the density of labeled cells between regions within the same slice, establishing a direct correspondence between paired data points. For these comparisons, we appropriately used a paired t-test.

      (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)

      We have specified the statistical tests used for each figure in the Methods section and Figure Legends for better clarity. Additionally, we have revised the descriptions for Figure 4G, L, and M and their corresponding Figure Legends to explicitly indicate that Spearman’s rank correlation coefficient (rₛ) was used for evaluation.

      (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).

      We have now clarified what “n” represents in each figure, as well as the number of animals (N) used in each experiment, in the Figure Legends.

      In this study, developmental stages of chick embryos were defined by HH stage (Hamburger and Hamilton, 1951), minimizing individual variability. Additionally, since our study focuses on the distribution of morphological characteristics of individual cells, averaging measurements per animal would obscure important cellular-level variability and potentially mislead interpretation of data. Furthermore, we employed a strategy of sparse genetic labeling in many experiments, which naturally results in variability in the number of measurable cells per animal. Given the clear distinctions in our data distributions, we believe that averaging per biological replicate is not essential in this case.

      To further ensure the robustness of our statistical analysis, data presented as boxplots were preliminarily assessed using PlotsOfDifferences, a web-based application that calculates and visualizes effect sizes and 95% confidence intervals based on bootstrapping (https://huygens.science.uva.nl/PlotsOfDifferences/; https://doi.org/10.1101/578575). Effect sizes can serve as a valuable alternative to p-values (Ho, 2018; https://www.nature.com/articles/s41592019-0470-3). The significant differences reported in our study are also supported by clear differences in effect sizes, ensuring that our conclusions remain robust regardless of the statistical approach used.

      If requested, we would be happy to provide PlotsOfDifferences outputs as supplementary source data files, similar to those used in eLife publications, for each figure.

      (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      We have now incorporated individual data points into the bar graphs in Figures 5 and 6.

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      The morphological differences of oligodendrocytes between white and gray matter are well established (i.e. shorter myelin at gray matter), but their correspondence with the nodal spacing pattern along the long axonal projections of cortical neurons is not well understood. Future research may find similarities with our findings. Additionally, as mentioned in the final section of the Discussion, the mammalian brainstem auditory circuit is functionally analogous to the avian ITD circuit. Regional differences in nodal spacing along axons have also been observed in the mammalian system, raising the important question of whether these differences are supported by regional heterogeneity in oligodendrocytes. Investigating this possibility will facilitate our understanding of the underlying logic and mechanisms for determining node spacing patterns along axons, as well as provide valuable insights into evolutionary convergence in auditory processing mechanisms. We described these explicitly in text (page 11, line 34).

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      As you summarized in your comment, our results demonstrated that A3V-eTeNT suppressed oligodendrogenesis in the NL region, leading to a reduction in oligodendrocyte density (Figures 6L, M), which caused the emergence of unmyelinated segments. While this is an indirect manipulation of oligodendrocyte density, it nonetheless provides evidence supporting a causal relationship between oligodendrocyte density and nodal spacing.

      The emergence of unmyelinated segments at the NL region further suggests that the myelin extension capacity of oligodendrocytes differs between regions, highlighting regional differences in intrinsic properties of oligodendrocyte as the most prominent determinant of nodal spacing variation. However, as you correctly pointed out, our findings do not establish direct causation.

      In the future, developing methods to artificially manipulate myelin length could provide a more definitive demonstration of causality. Given these considerations, we have modified the title to replace "determine" with "underlie", ensuring that our conclusions are presented with appropriate nuance.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:

      (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)

      (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)

      (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).

      (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Thank you for your insightful suggestions. We have incorporated the relevant references you provided and revised the manuscript accordingly to contextualize our findings within the existing literature.

      Minor comments:

      (7) Can the authors amend Fig. 1G with the correct units of measurement, not millimetres.

      Response: 

      Thank you for your suggestion. We have corrected the units in Figure 1G to µm

      (8) The Olig2 staining in Fig 2C does not appear to be nuclear, as would be expected of a transcription factor and as is well established for Olig2, but rather appears to be excluded from the nucleus, as it is in a ring or donut shape. Can the authors comment on this?

      Oligodendrocytes and OPCs have small cell bodies, often comparable in size to their nuclei. The central void in the ring-like Olig2 staining pattern appears too small to represent the nucleus. Additionally, a similar ring-like appearance is observed in BrdU labeling (Figure 5G), suggesting that this staining pattern may reflect nuclear morphology or other structural features.

      Significance

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

      The main finding of our study is that the primary determinant of the biased nodal spacing pattern in the sound localization circuit is the regional heterogeneity in the morphology of oligodendrocytes due to their intrinsic properties (e.g., their ability to produce and extend myelin sheaths) rather than the density of the cells. This was based on our observations that a reduction of oligodendrocyte density by A3V-eTeNT expression caused unmyelinated segments but did not increase internodal length (Figure 6), further revealing the importance of oligodendrocyte density in ensuring full myelination for the axons with short internodes. Thus, we think that our study could propose the significance of oligodendrocyte heterogeneity in the circuit function as well as in the nodal spacing using experimental manipulation of oligodendrocyte density. 

      Reviewer #3:

      Evidence, reproducibility and clarity

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing. I have some major concerns:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      Thank you for your insightful comment regarding the potential role of pre-nodal clusters in determining internodal length. Indeed, studies in zebrafish have suggested that pre-nodal clustering of node components prior to myelination may prefigure internodal length (Vagionitis et al., 2022). We have incorporated a discussion on whether such pre-nodal clusters could contribute to regional differences in nodal spacing in our manuscript (page 9, line 35).

      Whether pre-nodal clusters are detectable before myelination appears to depend on neuronal subpopulation (Freeman et al., 2015). To investigate the presence of pre-nodal clusters along NM axons in the brainstem auditory circuit, we previously attempted to visualize AnkG signals at E13 and E14. However, we did not observe clear structures indicative of pre-nodal clusters; instead, we only detected sparse fibrous AnkG signals with weak Nav clustering at their ends, consistent with hemi-node features. This result does not exclude the possibility of pre-nodal clusters on NM axons, as the detection limit of immunostaining cannot be ruled out. In brainstem slices, where axons are densely packed, nodal molecules are expressed at low levels across a wide area, leading to a high background signal in immunostaining, which may mask weak pre-nodal cluster signals prior to myelination. Regarding the comment on Figure 1D, we assume you are referring to Figure 2D based on the context. The lack of clarity in the high-magnification images in Figure 2D results from both the high background signal and the limited penetration of the MAG antibody. Furthermore, we are unable to verify Neurofascin accumulation at pre-nodal clusters, as there is currently no commercially available antibody suitable for use in chickens, despite our over 20 years of efforts to identify one for AIS research. Therefore, current methodologies pose significant challenges in visualizing pre-nodal clusters in our model. Future advancements, such as exogenous expression of fluorescently tagged Neurofascin at appropriate densities or knock-in tagging of endogenous molecules, may help overcome these limitations.

      However, a key issue to be discussed in this study is not merely the presence or absence of prenodal clusters, but rather whether pre-nodal clusters—if present—would determine regional differences in internodal length. To address this possibility, we have added new data in Figure 6I, measuring the length of unmyelinated segments that emerged following A3V-eTeNT expression.

      If pre-nodal clusters were fixed before myelination and predetermined internodal length, then the length of unmyelinated segments should be equal to or a multiple of the typical internodal length. However, our data showed that unmyelinated segments in the NL region were less than half the length of the typical NL internodal length, contradicting the hypothesis that fixed pre-nodal clusters determine internodal length along NM axons in this region.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      As mentioned in our reply to comment 2 by Reviewer #1, the diameter of NM axons was already evaluated using electron microscopy (EM) in the pioneering study by Seidl et al., (2010). Additionally, EM-based analysis makes it difficult to clearly distinguish between the main trunk of NM axons and thin collateral branches at the NL region. Accordingly, we did not do the EM analysis in this revision. 

      In Figure 4, we used palGFP, which is targeted to the cell membrane, allowing us to measure axon diameter by evaluating the distance between two membrane signal peaks. This approach minimizes the influence of the blurring of fluorescence signals on diameter measurements. Thus, we believe that our method is sufficient to evaluate the relative difference in axon diameters between regions and hence to show that axon diameter is not the primary determinant of the 3-fold difference in internodal length between regions. 

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      The heterogeneity in oligodendrocyte morphology would reflect differences in gene profiles, which, in turn, may arise from differences in their developmental origin and/or pericellular microenvironment of OPCs. We made this point as clear as possible in Discussion (page 9, line 21).

      Significance

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    1. Author response:

      We appreciate constructive feedback from both reviewers. Reviewer 1 provided a very positive assessment and helpful suggestions for clarity, which we will incorporate.

      We also thank Reviewer 2 for their detailed comments. In some instances, their public review raised concerns about specific data or interpretations that are, in fact, already presented and justified in the original manuscript. This feedback has highlighted a need to improve the clarity of our presentation. 

      In our revised manuscript, we will make key information more prominent to prevent further misunderstandings. We will also provide additional statistical validation for our conclusions, additional data from the optogenetic experiments and high throughput imaging, and further elaborate on the behaviors of specific proteins (FADD, MyD88, and RIPK1). We are confident that these revisions will make our findings more transparent and accessible to readers, and we look forward to submitting our revised manuscript.

    1. Author Response :

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses

      (1) The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Ammonia can come from a variety of sources both within and outside the cells and this can be from dead cells also. Ammonia by increasing cAMP levels, trigger collective cell movement thereby establishing a tip in Dictyostelium. A gaseous signal can act over long distances in a short time and for instance ammonia promotes synchronous development in a colony of yeast cells (Palkova et al., 1997; Palkova and Forstova, 2000). The slug tip is known to release ammonia probably favouring synchronized development of the entire colony of Dictyostelium. However, after the tips are established ammonia exerts negative chemotaxis probably helping the slugs to move away from each other ensuring equal spacing of the fruiting bodies (Feit and Sollitto, 1987).

      It is well known that ammonia serves as a signalling molecule influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). Ammonia by raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, is known to increase the speed of chemotaxing amoebae (Siegert and Weijer, 1989; Van Duijn and Inouye, 1991), inducing collective cell movement (Bonner et al., 1988, 1989), favoring tipped mound development.

      Ammonia produced in millimolar concentrations during tip formation (Schindler and Sussman, 1977) could ward off other predators in soil. For instance, ammonia released by Streptomyces symbionts of leaf-cutting ants is known to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled back into amino acids, as observed during breast cancer proliferation (Spinelli et al., 2017). Such a process may also occur in starving Dictyostelium cells, supporting survival and differentiation. These findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia reinforces or maintains the positional information by elevating cAMP levels, favoring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). In adgf mutants that have low ammonia levels, both neutral red staining (a marker for prestalk and ALCs) (Figure. S3) and the prestalk marker ecmA/ ecmB expression (Figure. 7D) are higher than the WT and the mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.

      Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement during tip formation (Bonner et al., 1989).

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      Exposure of adgf mounds to ammonia, led to tip development within 4 h (Figure. 5). In contrast, adgf controls remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not the trigger for tip development and ammonia promotes the transition from mound to tipped mound formation.

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Further, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Figure. S4A), and they continue to stay as mounds.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Figure. 2H. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) and suggests that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines: 47,48 - "The gradient of these morphogens along the slug axis determines the cell fate, either as prestalk (pst) or as prespore (psp) cells." - many workers have shown that this is not true - intrinsic factors such as cell cycle phase drive cell fate.

      Thank you for pointing this out. We have removed the line and rephrased as “Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate as prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011).

      (2) Line 48 - PKA - please explain acronyms at first use.

      Corrected

      (3) Line 56 - The relationship between adenosine deaminase and ADGF is a bit unclear, please clarify this more.

      Adenosine deaminase (ADA) is intracellular, whereas adenosine deaminase related growth factor (ADGF) is an extracellular ADA and has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008).

      (4) Figure 1 - where are these primers, and the bsr cassette, located with respect to the coding region start and stop sites?

      The primer sequences are mentioned in the supplementary table S2. The figure legend is updated to provide a detailed description.

      (5) Line 104 - 37.47% may be too many significant figures.

      Corrected

      (6) Line 123 - 1.003 Å may be too many significant figures.

      Corrected

      (7) Line 128 - Since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected

      (8) Figure 3G - did the DCF also increase mound size? It sort of looks like it did.

      Yes, the addition of DCF increases the mound size (now Figure. 2G).

      (9) Figure 3I - the spore mass shown here for ADGF - looks like there are 3 stalks protruding from it; this can happen if a plate is handled roughly and the spore masses bang into each other and then merge

      Thank you for pointing this out. The figure 3I (now Figure. 2I) is replaced.

      (10) Lines 160-162 - since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected.

      (11) Line 165 - ' ... that are involved in adenosine formation' needs a reference.

      Reference is included.

      (12) Line 205 - 'Addition of ADA to the CM of the mutant in one compartment.' - might clarify that the mutant is the ADGF mutant

      Yes, revised to 'Addition of ADA to the CM of the adgf mutant in one compartment.

      (13 Lines 222-223 need a reference for caffeine acting as an adenosine antagonist.

      Reference is included.

      (14) Figure 8B - left - use a 0-4 or so scale so the bars are more visible.

      Thank you for the suggestion. The scale of the y-axis is adjusted to 0-4 in Figure. 7B to enhance the visibility of the bars.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      (1) Weaknesses: Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      adgf expression was examined at 0, 8, 12, and 16 h (Figure. 1), and the total ADA activity was assayed at 12 and 16 h (Figure. 3). Previously, the 12 h data was not included, and it’s been added now (Figure. 3A). The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. Since the ADA assay will also report the activity of other three isoforms, it will not exclusively reflect ADGF activity.

      Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) suggesting that WT adgf favours prespore differentiation. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The adgf mutant in comparison to WT has diminished acaA expression (Fig. 6B) and reduced cAMP levels (Fig. 6A) both at 12 and 16 h of development. The cAMP levels were measured at 8 h and 12 h in the mutant.

      We would like to add that ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) in Dictyostelium. Exposure to ammonia increases acaA expression in WT (Figure. 7B) and is likely to increase acaA expression/ cAMP levels in the mutant also (Riley and Barclay, 1990; Feit et al., 2001) thereby rescuing the defects in cAMP signalling. Based on the comments, cAMP levels will also be measured in the mutant after the rescue with ammonia.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      cAMP levels will be quantified in the dhkD mutant after treatment with ammonia. The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation.

      Reviewer #2 (Recommendations for the authors):

      The paper describes new insights into the role of ADGF, an enzyme that catalyses the breakdown of adenosine in ammonia and inosine, in tip formation in Dictyostelium development.

      A knockout of the gene results in a tipless mound stage arrest and the mounds formed are somewhat larger in size. Synergy experiments show that the effect of the mutation is non-cell autonomous and further experiments show that the mound arrest phenotype can be rescued by the provision of ammonia vapour. These observations are well documented. Furthermore, the paper contains a wide variety of experiments attempting to place the observed effects in known signalling pathways. It is suggested that ADGF may function downstream of DhkD, a histidine kinase previously implicated in ammonia signalling. Ammonia has long been described to affect different aspects, including differentiation of slug and culmination stages of Dictyostelium development, possibly through modulating cAMP signalling, but the exact mechanisms of action have not yet been resolved. The experiments reported here to resolve the mechanistic basis of the mutant phenotype need focusing and further work.

      (1) The paper needs streamlining and editing to concentrate on the main findings and implications.

      The manuscript will be revised extensively.

      Below is a list of some more specific comments and suggestions.

      (2) Introduction: Focus on what is relevant to understanding tip formation and the role of nucleotide metabolism and ammonia (see https://doi.org/10.1016/j.gde.2016.05.014).leading). This could lead to the rationale for investigating ADGF.

      The manuscript will be revised extensively

      (3) Lines 36-38 are not relevant. Lines 55-63 need shortening and to focus on ADGF, cellular localization, and substrate specificity.

      The manuscript will be revised accordingly. Lines 36-38 will be removed, and the lines 55-63 will be shortened.

      In humans, two isoforms of ADA are known including ADA1 and ADA2, and the Dictyostelium homolog of ADA2 is adenosine deaminase-related growth factor (ADGF). Unlike ADA that is intracellular, ADGF is extracellular and also has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008). Loss-of-function mutations in ada2 are linked to lymphopenia, severe combined immunodeficiency (SCID) (Gaspar, 2010), and vascular inflammation due to accumulation of toxic metabolites like dATP (Notarangelo, 2016; Zhou et al., 2014).

      (4) Results: This section would benefit from better streamlining by a separation of results that provide more mechanistic insight from more peripheral observations.

      The manuscript will be revised and the peripheral observations (Figure. 2) will be shifted to the supplementary information.

      (5) Line 84 needs to start with a description of the goal, to produce a knockout.

      Details on the knockout will be elaborated in the revised manuscript. Line number 84 (now 75). Dictyostelium cell lines carrying mutations in the gene adgf were obtained from the genome wide Dictyostelium insertion (GWDI) bank and were subjected to further analysis to know the role of adgf during Dictyostelium development.

      (6) Knockout data (Figure 1) can be simplified and combined with a description of the expression profile and phenotype Figure 3 F, G, and Figure 5. Higher magnification and better resolution photographs of the mutants would be desirable.

      Thank you, as suggested the data will be simplified (section E will be removed) and combined with a description of the expression profile and, the phenotype images of Figure 3 F, G, and Figure 5 ( now Figure. 2 F, G, and Figure. 4) will be replaced with better images/ resolution.

      (7) It would also be relevant to know which cells actually express ADGF during development, using in-situ hybridisation or promoter-reporter constructs.

      To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (8) Figure 2 - Information is less directly relevant to the topic of the paper and can be omitted (or possibly in Supplementary Materials).

      Figure. 2 will be moved to supplementary materials.

      (9) Figures 4A, B - It is shown that as could be expected ada activity is somewhat reduced and adenosine levels are slightly elevated. However, the fact that ada levels are low at 16hrs could just imply that differentiation of the ADGF- cells is blocked/delayed at an earlier time point. To interpret these data, it would be necessary to see an ada activity and adenosine time course comparison of wt and mutant, or to see that expression is regulated in a celltype specific manner that could explain this (see above). It would be good to combine this with the observation that ammonia levels are lower in the ADGF- mutant than wildtype and that the mutant phenotype, mound arrest can be rescued by an external supply of ammonia (Figure 6).

      In Dictyostelium four isoforms of ADA including ADGF are present, and thus the time course of total ADA activity will also report the function of other isoforms. Further, a number of pathways, generate adenosine (Dunwiddie et al., 1997; Boison and Yegutkin, 2019). ADGF expression was examined at 0, 8, 12 and 16 h (Fig 1) and the ADA activity was assayed at 12 h, the time point where the expression gradually increases and reaches a peak at 16 h. Earlier, we have not shown the 12 h activity data which will be included in the revised version. ADGF expression was found to be highly elevated at 16 h and adenosine/ammonia levels were measured at the two points indicated in the mutant.

      (10) Panel 4C could be combined with other measurements trying to arrive at more insight in the mechanisms by which ammonia controls tip formation.

      Panel 4C (now 3C) illustrates the genes involved in the conversion of cAMP to adenosine. Since Figure. 3 focuses on adenosine levels and ADA activity in both WT and adgf mutants, we have retained Panel 3C in Figure. 3, for its relevance to the experiment.

      (11) There is a large variety of experiments attempting to link the mutant phenotype and its rescue by ammonia to cAMP signalling, however, the data do not yet provide a clear answer.

      It is well known that ammonia increases cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) and adenylate cyclase activity (Cotter et al., 1999) in D. discoideum, and exposure to ammonia increases acaA expression (Fig 7B) suggesting that ammonia regulates cAMP signaling. To address the concerns, cAMP levels will be quantified in the mutant after ammonia treatment.

      (12) The mutant is shown to have lower cAMP levels at the mound stage which ties in with low levels of acaA expression (Figures 7A and B), also various phosphodiesterases, the extracellular phosphodiesterase pdsa and the intracellular phosphodiesterase regA show increased expression. Suggesting a functional role for cAMP signalling is that the addition of di cGMP, a known activator of acaA, can also rescue the mound phenotype (Figure 7E). There appears to be a partial rescue of the mound arrest phenotype level by the addition of 8Br-cAMP (fig 7C), suggesting that intracellular cAMP levels rather than extracellular cAMP signalling can rescue some of the defects in the ADGF- mutant. Better images and a time course would be helpful.

      The relevant images will be replaced and a developmental time course after 8-Br-cAMP treatment will be included in the revised manuscript (Figure. 6D).

      (13) There is also the somewhat surprising observation that low levels of caffeine, an inhibitor of acaA activation also rescues the phenotype (Figure 7F).

      With respect to caffeine action on cAMP levels, the reports are contradictory. Caffeine has been reported to increase adenylate cyclase expression thereby increasing cAMP levels (Hagmann, 1986) whereas Alvarez-Curto et al., (2007) found that caffeine reduced intracellular cAMP levels in Dictyostelium. Caffeine, although is a known inhibitor of ACA, is also known to inhibit PDEs (Nehlig et al., 1992; Rosenfeld et al., 2014). Therefore, if caffeine differentially affects ADA and PDE activity, it may potentially counterbalance the effects and rescue the phenotype.

      (14) The data attempting to asses cAMP wave propagation in mounds (Fig 7H) are of low quality and inconclusive in the absence of further analysis. It remains unresolved how this links to the rescue of the ADGF- phenotype by ammonia. There are no experiments that measure any of the effects in the mutant stimulated with ammonia or di-cGMP.

      The relevant images will be replaced (now Figure. 6H). Ammonia by increasing acaA expression (Figure. 7B), and cAMP levels (Figure. 7C) may restore spiral wave propagation, thereby rescuing the mutant.

      (15) A possible way forward could also come from the observation that ammonia can rescue the wobbling mound arrest phenotype from the histidine kinase mutant dhkD null mutant, which has regA as its direct target, linking ammonia and cAMP signalling. This is in line with other work that had suggested that another histidine kinase, dhkC transduces an ammonia signal sensor to regA activation. A dhkC null mutant was reported to have a rapid development phenotype and skip slug migration (Dev. Biol. (1998) 203, 345). There is no direct evidence to show that dhkD acts upstream of ADGF and changes in cAMP signalling, for instance, measurements of changes in ADA activity in the mutant.

      cAMP levels will be quantified in the dhkD mutant after ammonia treatment and accordingly, the results will be revised.

      (16) The paper makes several further observations on the mutant. After 16 hrs of development the adgf- mutant shows increased expression of the prestalk cell markers ecmA and ecmB and reduced expression of the prespore marker pspA. In synergy experiments with a majority of wildtype, these cells will sort to the tip of the forming slug, showing that the differentiation defect is cell autonomous (Fig 9). This is interesting but needs further work to obtain more mechanistic insight into why a mutant with a strong tip/stalk differentiation tendency fails to make a tip. Here again, knowing which cells express ADGF would be helpful.

      The adgf mutant shows increased prestalk marker expression in the mound but do not form a tip. It is well known that several mound arrest mutants form differentiated cells but are blocked in development with no tips (Carrin et al., 1994). This is addressed in the discussions (539). To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (17) The observed large mound phenotype could as suggested possibly be explained by the low ctn, smlA, and high cadA and csA expression observed in the mutant (Figure 3). The expression of some of these genes (csA) is known to require extracellular cAMP signalling. The reported low level of acaA expression and high level of pdsA expression could suggest low levels of cAMP signalling, but there are no actual measurements of the dynamics of cAMP signalling in this mutant to confirm this.

      The acaA expression was examined at 8 and 12 h (Figure. 6B) and cAMP levels were measured at 12 and 16 h in the adgf mutants (Figure. 6A). Both acaA expression and cAMP levels were reduced, suggesting that cells expressing adgf regulate acaA expression and cAMP levels. This regulation, in turn, is likely to influence cAMP signaling, collective cell movement within mounds, ultimately driving tip development. Exposure to ammonia led to increased acaA expression (Figure. 7B) in in WT. Based on the comments above, cAMP levels will be measured in the mutant before and after rescue with ammonia.

      (18) Furthermore, it would be useful to quantify whether ammonia addition to the mutant reverses mound size and restores any of the gene expression defects observed.

      Ammonia treatment soon after plating or six hours after plating, had no effect on the mound size (Figure. 5G).

      (19) There are many experimental data in the supplementary data that appear less relevant and could be omitted Figure S1, S3, S4, S7, S8, S9, S10.

      Figure S8, S9, S10 are omitted. We would like to retain the other figures

      Figure S1 (now Figure. S2): It is widely believed that ammonia comes from protein (White and Sussman, 1961; Hames and Ashworth, 1974; Schindler and Sussman, 1977) and RNA (Walsh and Wright, 1978) catabolism. Figure. S2 shows no significant difference in protein and RNA levels between WT and adgf mutant strains, suggesting that adenosine deaminaserelated growth factor (ADGF) activity serves as a major source of ammonia and plays a crucial role in tip organizer development in Dictyostelium. Thus, it is important to retain this figure.

      Figure S3 (now Figure. S4): The figure shows the treatment of various mound arrest mutants and multiple tip mutants with ADA enzyme and DCF, respectively, to investigate the pathway through which adgf functions. Additionally, it includes the rescue of the histidine kinase mutant dhkD with ammonia, indicating that dhkD acts upstream of adgf via ammonia signalling. Therefore, it is important to retain this figure.

      Figure S4 (now Figure. S5): This figure represents the developmental phenotype of other deaminase mutants. Unlike adgf mutants, mutations in other deaminases do not result in complete mound arrest, despite some of these genes exhibiting strong expression during development. This underscores the critical role of adenosine deamination in tip formation. Therefore, let this figure be retained.

      Figure S7 (now Figure. S8): Figure S8 presents the transcriptomic profile of ADGF during gastrulation and pre-gastrulation stages across different organisms, indicating that ADA/ADGF is consistently expressed during gastrulation in several vertebrates (Pijuan-Sala et al., 2019; Tyser et al., 2021). Notably, the process of gastrulation in higher organisms shares remarkable similarities with collective cell movement within the Dictyostelium mound (Weijer, 2009), suggesting a previously overlooked role of ammonia in organizer development. This implies that ADA may play a fundamental role in regulating morphogenesis across species, including Dictyostelium and vertebrates. Therefore, we would like to retain this figure.

      (20) Given the current state of knowledge, speculation about the possible role of ADGF in organiser function in amniotes seems far-fetched. It is worth noting that the streak is not equivalent to the organiser. The discussion would benefit from limiting itself to the key results and implications.

      The discussion is revised accordingly by removing the speculative role of ADGF in organizer function in amniotes. The lines “It is likely that ADA plays a conserved, fundamental role in regulating morphogenesis in Dictyostelium and other organisms including vertebrates” have been removed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper sets out to examine the social recognition abilities of a 'solitary' jumping spider species. It demonstrates that based on vision alone spiders can habituate and dishabituate to the presence of conspecifics. The data support the interpretation that these spiders can distinguish between conspecifics on the basis of their appearance.

      We appreciate the reviewer’s summary. We indeed aimed at investigating the social recognition abilities of the solitary jumping spider (Phidippus regius), using visual cues alone. By employing a habituation-dishabituation paradigm, well-established in developmental psychology, we found support for the interpretation that these spiders can distinguish between conspecifics based on their appearance, as the reviewer noted.

      Strengths:

      The study presents two experiments. The second set of data recapitulates the findings of the first experiment with an independent set of spiders, highlighting the strength of the results. The study also uses a highly quantitative approach to measuring relative interest between pairs of spiders based on their distance.

      We appreciate the reviewer's acknowledgement of the strengths of our study. The second set of data underscores the robustness and reliability of the results. Additionally, however, the second experiment served the purpose of disentangling whether the habituation effect observed over sessions was caused by ‘physical’ or ‘cognitive’ fatigue by employing ‘long-term’ dishabituation trials at the end of Session 3. These trials are critical in our study as they help to differentiate between recognition of individual identities versus recognition of familiar individuals (as opposed to unfamiliar ones) and to determine if the observed effects are due to ‘general habituation’ or ‘specific recognition’. We will elaborate on this further below in this revision.

      As stated by the reviewer, we employed a highly quantitative approach to measure relative interest between pairs of spiders based on their distance, providing precise and objective data to support our conclusions.

      Weaknesses:

      The study design is overly complicated, missing key controls, and the data presented in the figures are not clearly connected to the study. The discussion is challenging to understand and appears to make unsupported conclusions.

      While we acknowledge that the study design is indeed complex, this complexity is essential for conducting a well-controlled and balanced experiment regarding the experimental conditions.  

      The habituation-dishabituation paradigm is a well-established paradigm in developmental psychology with non-verbal infants. It is understood that during the habituation phase, an individual's attention to a repeated stimulus decreases as they engage in information processing and form a mental representation of it. As the stimulus becomes familiar, it loses its novelty and interest. When a new stimulus is introduced, a recovery of attention suggests that the individual has compared this new stimulus to the stored memory of the habituation stimulus and detected a difference. This process suggests that the individual not only remembered the original stimulus but also recognized the new one as distinct (for a review Kavšek & Bornstein, 2010).

      This paradigm has also been extensively applied in animal research, where, like infants, nonverbal subjects rely on recognition and discrimination processes to demonstrate their cognitive abilities. The use of this paradigm dates back to seminal studies such as Humphrey (1974), which explored the perceptual world of monkeys, illustrating how species and individuals are perceived and recognized. In another previous study (Dahl, Logothetis, and Hoffman, 2007), we utilized an even more complex experimental design that incorporated dedicated baseline trials for both habituation and dishabituation phases, which was well-received despite its complexity. In the current study, we contrast dishabituation and habituation trials directly, creating a sequential cascade where each trial is evaluated against the preceding one as its baseline.

      On the basis of these arguments, we respectfully decline the claim that this paradigm is inappropriate or lacks key controls. Our study design, though complex, is rigorously grounded in established methodologies and offers a robust framework for exploring individual recognition in Phidippus regius.

      However, we take the reviewer’s comments seriously and are committed to identifying and addressing the aspects in our manuscript that may have led to misunderstandings. We clarify these areas in our revision of the manuscript. Modifications were made in the Introduction, Methods, and Discussion sections.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      Kavšek, M., & Bornstein, M. H. (2010). Visual habituation and dishabituation in preterm infants: A review and meta-analysis. Research in developmental disabilities, 31(5), 951-975.

      (1) Study design: The study design is rather complicated and as a result, it is difficult to interpret the results. The spiders are presented with the same individual twice in a row, called a habituation trial. Then a new individual is presented twice in a row. The first of these is a dishabituation trial and the second is another habituation trial (but now habituating to a second individual). This is done with three pairings and then this entire structure is repeated over three sessions. 

      While we acknowledge that the design is complex, this complexity is essential for conducting a well-controlled experiment, as described earlier. As the reviewer noted, our design involves presenting the same individual to the focal spider twice in a row (habituation trial), followed by a new individual (dishabituation trial), and then repeating this structure. This approach is fundamental to the habituation-dishabituation paradigm, which allows us to systematically compare the responses to a familiar individual with those elicited by a novel one. If the spiders exhibit different behaviours in terms of the distance they maintain when encountering the same individual versus a new one, it indicates that they are processing the stimuli differently, consistent with recognition memory. This differential response is a key indicator that the spiders can distinguish between familiar and unfamiliar individuals, demonstrating not only a decrease in interest or engagement due to repeated exposure but also a cognitive process where the lack of a matching memory template triggers a distinct behavioural response when confronted with novel stimuli.

      By repeating this sequence two more times (Session 2 and 3), we aim to assess the consistency of this recognition process over time. If the focal spider does not remember the individuals from the previous session (one hour ago), we expect consistent behavioural responses across sessions. Conversely, if there is a decrease in response magnitude but the overall response patterns are maintained, we can infer that the focal spider recognizes the previously presented individuals and exhibits habituation, reflected in reduced response intensity. In other words, over sessions and repeated exposure to the same individuals, the memory traces become more firmly established, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent to the point where “habituation” and “dishabituation” trials become indistinguishable, as observed in Session 3. This method allows us to assess the duration of identity recognition in these spiders, indicating how long the memory of specific individuals persists. 

      All of these outcomes were anticipated before we began Experiment 1. Given that the results aligned with our predictions, we then sought to determine whether the observed reduction in the magnitude of the effect (i.e., the difference between habituation and dishabituation trials) was due to a physical fatigue effect, where the spiders might simply be getting tired, or a cognitive fatigue effect, where the spiders recognized the individuals and as a result did not exhibit any novelty response. To address this, we replicated the experiment with a new group of spiders and introduced special (long-term dishabituation) trials at the end, where the focal spider was presented with a novel spider. 

      These extra trials allowed us to disentangle the nature of the diminishing response across repeated sessions: a lack of dishabituation (remaining distant) would suggest general physical fatigue, whereas a strong dishabituation response (approaching closely) to the novel spider would indicate cognitive fatigue, thereby confirming that the spiders were indeed recognizing the familiar individuals throughout the experiment. 

      In light of these considerations, we believe that the complexity of our design is not only justified but absolutely necessary to rigorously test the cognitive capabilities of the spiders. Nonetheless, we understand the need for clarity in presenting our findings and are committed to refining our manuscript to better communicate the rationale and results of our study.

      The data appear to show the strong effects of differences between habituation and dishabituation trials in the first session. The decrease in differential behavior between the socalled habituation and dishabituation trials in sessions 2 and 3 is explained as a consequence of the spiders beginning to habituate in general to all of the individuals. 

      The key question, as mentioned above, is to determine the underlying cause of this general habituation across sessions. Specifically, we aim to differentiate between two potential causes: physical fatigue, where the spiders may simply become less responsive due to the demands of the three-hour testing period, or cognitive fatigue, where the repeated exposure to the same individuals leads to a decreased response because the spiders have started to recognize these individuals over multiple repetitions.

      To address this, we replicated the experiment and introduced each focal spider to a new individual in what we termed "long-term dishabituation" trials. By comparing the spiders' responses to these novel individuals with their responses in earlier trials, we sought to better understand the underlying mechanisms of habituation and the duration of individual recognition. The strong dishabituation response observed in these trials is indicative of cognitive fatigue, supporting the presence of recognition memory rather than a general physical fatigue effect.

      The claim that the spiders remember specific individuals is somewhat undercut because all of the 'dishabituation' trials in session 2 are toward spiders they already met for 14 minutes previously but seemingly do not remember in session 2. 

      We appreciate the reviewer’s comment regarding the claim that spiders do not remember specific individuals. This assessment does not align with the rationale of our experiment. The reviewer noted that the dishabituation trials in session 2 involved spiders previously encountered and suggested that the lack of a clear memory response might undercut the claim of specific individual recognition. 

      However, as we explained earlier, we expect habituation in Session 2 relative to Session 1 precisely because spiders recognize each other in Session 2. If there were no such habituation in Sessions 2 or 3, it would suggest that the spiders’ recognition memory does not persist beyond one hour. 

      Additionally, it is important to correct the timing noted by the reviewer: each individual spider reencounters the same spider exactly one hour later, not 14 minutes. This is detailed in Table 2 of the manuscript, which outlines that each trial lasts 7 minutes, with a 3-minute visual separation between trials. With six trials per session, this totals to 1 hour per session. Thus, every pair of spiders re-encounters exactly 1 hour after their last interaction.

      Again, it is important to clarify that the observed decrease in differential behaviour is not indicative of a failure to remember specific individuals. Rather, it reflects a systematic pattern of habituation, which is a common and expected outcome in such paradigms. This systematic decrease in response strength suggests that the spiders recognize the previously encountered individuals and becoming less responsive over repeated exposures, consistent with the process of habituation. In different terms, the repeated exposure to the same individuals leads to more firmly established memory traces, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent.

      Based on the explanations provided above, we respectfully reject the claim that “the spiders remember specific individuals is somewhat undercut […]”. In contrast, this claim is incorrect, as the exact opposite is true. The very strength of our study lies in demonstrating that spiders possess robust recognition memory, as evidenced by a clear dissociation of habituation and dishabituation trials in Session 1, followed by a gradually diminishing effect over Session 2 and 3 as the spiders are increased exposed to the same individuals: Furthermore, the strong rebound from habituation observed in long-term dishabituation trials, where the spiders were exposed to novel individuals. 

      This misunderstanding suggests that we should take additional care in the revised manuscript to clarify our explanations and provide more detail, ensuring that the rationale behind our experimental design and findings are communicated effectively.

      In session 3 it is ambiguous what is happening because the spiders no longer differentiate between the trial types. This could be due to fatigue or familiarity. 

      The reviewer proposes that the absence of differentiation between 'habituation' and 'dishabituation' trials in Session 3 might be attributed to either fatigue or familiarity. We interpret "fatigue" as what we have termed the “physical fatigue effect” and "familiarity" as “cognitive fatigue effect.” In this context, we concur with the reviewer’s observation, and this very line of reasoning prompted us to conduct a further experiment following the outcome of Experiment 1.

      A second experiment is done to show that introducing a totally novel individual, recovers a large dishabituation response, suggesting that the lack of differences between 'habituation' and 'dishabituation' trials in session 3 is the result of general habituation to all of the spiders in the session rather than fatigue. As mentioned before, these data do support the claim that spiders differentiate among individuals.

      As the reviewer rightly noted, we addressed these possibilities in our second experiment by introducing a completely novel individual to the spiders, which resulted in a strong dishabituation response. This outcome suggests that the lack of differentiation in Session 3 is more likely due to cognitive habituation rather than physical fatigue. The robust response to novel individuals demonstrates that the spiders are capable of distinguishing between familiar and unfamiliar individuals, suggesting that the reduced differentiation is a consequence of habituation from repeated encounters with the same individuals. 

      We appreciate the reviewer's recognition that these findings support the conclusion that spiders are capable of differentiating between individual conspecifics.

      Additionally, it is important to clarify the structure of our sessions. Each of the 6 trials lasts 7 minutes with a 3-minute visual separation, resulting in a total of 1 hour per session. This ensures that each pair of spiders is encountered exactly one hour later, which controls for the timing and allows us to evaluate the spiders' recognition memory over repeated sessions.

      In summary, while the data show a decrease in differential behaviour between habituation and dishabituation trials in Session 2 and 3, the results from our second experiment support the interpretation that this is due to ‘cognitive habituation’ (familiarization) rather than ‘physical fatigue’ (general habituation). This habituation effect underscores the spiders' ability to recognize and become familiar with specific individuals over time, reinforcing our conclusion that they can differentiate among individuals.

      The data from session 1 are easy to interpret. The data from sessions 2 and 3 are harder to understand, but these are the trials in which they meet an individual again after a substantial period of separation. 

      The data from Session 1 are straightforward to interpret, showing clear differences between habituation and dishabituation trials. However, the data from Sessions 2 and 3 are more complex, as these sessions involve the spiders re-encounter individuals after a 1-hour period of separation. Importantly, the outcome is not an artefact in our experiment, but the consequence of a deliberate choice in the experimental design to assess whether spiders can recognise each other after this duration. We believe that this complexity aligns with our expectations, based on the assumption that spiders can recognise each other after one hour. The observed pattern of habituation in Sessions 2 and 3 suggests that the spiders retain memory of the individuals, leading to decreased responsiveness upon repeated encounters. This interpretation is further supported by the Experiment 2, which introduced a novel individual and elicited a strong dishabituation response. This finding confirms that the reduced differentiation in later sessions is due to cognitive habituation rather than physical fatigue, supporting the conclusion that recognition memory last at least one hour.

      We hope this explanation clarifies our findings and the rationale behind our relatively complex experimental design choice. 

      Other studies looking at recognition in ants and wasps (cited by the authors) have done a 4 trial design in which focal animal A meets B in the first trial, then meets C in the second trial, meets B again in the third trial, and then meets D in the last trial. In that scenario trials 1, 2, and 4 are between unfamiliar individuals and trial 3 is between potentially familiar individuals. In both the ants and wasps, high aggression is seen in species with and without recognition on trial 1, with low aggression specifically for trials with familiar individuals in species with recognition. Across different tests, species or populations that lack recognition have shown a general reduction in aggression towards all individuals that become progressively less aggressive over time (reminiscent of the session 2 and 3 data) while others have maintained modest levels of aggression across all individuals. The 4 session design used in those other studies provides an unambiguous interpretation of the data while controlling for 'fatigue'. 

      We acknowledge that there are multiple ways to design experiments to test recognition memory. In fact, we considered using the paradigm similar to the one proposed by the reviewer and used in studies like Dreier et al., which involves a series of trials with unfamiliar and familiar individuals over extended intervals. We then, however, opted for a more complex design to rigorously assess how habituation and recognition memory develop over repeated sessions with shorter intervals.

      In the following, we would like to describe the advantages and disadvantages of both paradigms and outline how we ended up using the more complex version:

      Advantages of our paradigm: 

      As pointed out, by repeating the sequence in exactly similar manner (every same pair of spiders reoccurs after exactly 1 and 2 hours), we can comprehensively evaluate the effect of habituation over multiple exposures. This allows us to assess the extent of the spiders’ memory, when a spider shows stronger habituation to individuals that were novel in Session 1 but “familiar” by the time they encounter them again in Session 2. To achieve this, we need to ensure that each trial and visual separation is precisely timed, ensuring consistent intervals between encounters. As a consequence, each individual spider undergoes the exact same experimental protocol. Most critically, however, are the novel individuals presented after Session 3 (long-term dishabituation trials) that help differentiate between cognitive habituation and physical fatigue.  Disadvantages of our paradigm:

      The sequences of habituation and dishabituation trials may make the design more complex, as pointed out by the reviewer. As a consequence, the interpretation will become more difficult. However, the data perfectly align with our predictions, and the outcomes were as anticipated in two independently run experiments with two groups of spiders. This highlights the reliability of our experimental design and robustness of our findings.

      Advantages of the 4-trial paradigm proposed by the reviewer:

      Clearly, the structure of the proposed design is simpler, making interpretation easier. The paradigm also accommodates longer intervals between trials (e.g., 24 hours). Longer intervals could theoretically have been applied in our study. (However, we chose not to leave the spiders in the experimental box longer than necessary, opting instead to return them to their home containers for the night to ensure their well-being. And, a 24-hour interval targets a different phase in the process of long-term memory, but more to this topic further below.)

      Disadvantages of the 4-trial paradigm proposed by the reviewer:

      Strictly replicating the 4-trial design would result in one familiar encounter versus three unfamiliar ones. This imbalance might introduce bias and limit the robustness of the measurements. Additionally, the design provides less data overall, as the focal individual will be confronted with three other individuals, who will then be excluded from further testing as focal subjects themselves. In contrast, our design ensures a balanced number of familiar0020(habituation) and novel encounters (dishabituation) for each focal individual, allowing for more efficient and comprehensive data collection without excluding individuals from further testing.

      Given the aforementioned considerations, we determined that the advantages of our experimental design, in particular the assessment of a cognitive fatigue effect when encountering the same individuals again, outweigh those of the proposed 4-trial design. The mentioned limitations of the 4-trial design, such as the potential for bias and less comprehensive data collection, do not justify re-running the study, especially when the best case scenario is fewer insights than our already existing findings. Our current paradigm yielded results that align perfectly with our predictions, offering a thorough and reliable understanding of recognition memory and habituation in spiders. Therefore, we believe our approach provides a more complete and robust answer to our research questions.

      However, we acknowledge that there might be insufficient information in the manuscript addressing the rationale behind our design choices, and we will revise the manuscript to provide a clearer explanation of why our approach is well suited to answering the research questions at hand.

      That all trials in sessions 2 and 3 are always with familiar individuals makes it challenging to understand how much the spiders are habituating to each other versus having some kind of associative learning of individual identity and behavior.

      We understand the reviewer's concern that having all trials in Sessions 2 and 3 involve familiar individuals could make it challenging to distinguish between general habituation and associative learning of individual identities. In our study, we contrast habituation and dishabituation trials: If general habituation were occurring, we would expect uniformly reduced responses (around the zero line) to all individuals over time, indicating that the spiders are getting used to any individual regardless of their specific identity. However, this is not the case. Our data show that while the responses in Session 2 are reduced in effect size compared to Session 1, they are not flat (around the zero line). This indicates that the spiders still differentiate between a repetition of a spider identity (habituation trials) and two different spider identities (dishabituation trials), albeit with a reduced response strength. The systematicity in the data suggests that the spiders are not merely habituating to any individual, but are instead retaining some level of recognition between specific individuals.

      Only by Session 3 do the spiders fully habituate to the point where the responses to habituation and dishabituation trials converge, indicating a complete habituation effect. The introduction of novel individuals in our long-term dishabituation trials further supports the idea that the spiders are recognizing specific individuals rather than exhibiting general habituation. If the spiders were experiencing general habituation, we would not expect the strong dishabituation response observed in our study.

      The data presentation is also very complicated. How is it the case that a negative proportion of time is spent? The methods reveal that this metric is derived by comparing the time individuals spent in each region relative to the previous time they saw that individual. 

      We understand the reviewer's concern regarding the complexity of the data presentation and the calculation of the negative proportion of time. Regarding the complexity of the design, we have already justified our choice of a more intricate experimental setup. This complexity is necessary for accurately assessing recognition memory and habituation over repeated sessions. 

      The metric is derived by comparing the time individuals spent in each region (relative to the transparent front panel) in the current trial (n) relative to the previous trial (n-1). With multiple trials, this results in a cascade of trials and conditions. This method was established in

      Humphrey’s and our previous study (Humphrey, 1974; Dahl, Logothetis, Hoffman, 2007), where we demonstrated its effectiveness in assessing individuation of faces in macaque monkeys.  

      Also in our current experimental design, each current trial is contrasted with the preceding one, allowing us to compare distributions of distances taken in two trials. In this context, every preceding trial serves as baseline for every current trial. 

      Figure 1 of the manuscript, illustrates the structure and analysis of the trials,

      Panel a depicts the baseline, habituation, and dishabituation trials, where spiders are exposed to different conspecifics.

      Baseline (left panel, red): When two spiders are visually exposed to each other for the first time, it is expected that they will explore each other closely, exhibiting high levels of proximity (initial exploratory behaviour).

      Habituation (centre panel, green): When the same spiders are reintroduced in a subsequent round of exposure, it is anticipated that they will exhibit reduced exploratory behaviour and maintain a greater distance compared to the baseline trial, if they recognize each other from the previous encounter (indicative of habituation).

      Panel b (upper and middle panels; red and green): Demonstrates the theoretical assumptions and expected changes in behaviour:

      By subtracting the distribution of distances in the baseline trial from the habituation trial, we generate a delta distribution. This delta distribution reveals negative values near the transparent panel (indicating reduced proximity in the habituation trial) and positive values at mid- to fardistances (indicating increased distancing behaviour). This delta distribution is also what is reported in Figure 2. 

      Dishabituation: In this trial, a new spider (different from the one in the habituation trial) is introduced. The dishabituation trial will be considered in contrast to the habituation trial described above. If the spider recognizes the new individual as different, it is expected to show increased exploratory behaviour and reduced distance, similar to the initial baseline trial.

      By subtracting the distribution of distances in the habituation trial from the dishabituation trial, we obtain another delta distribution. This delta distribution should reveal positive values near the transparent panel (indicating increased proximity in the dishabituation trial) and negative values at mid- to far-distances (indicating decreased proximity compared to the habituation trial).

      We hope this clarifies the rationale behind our data presentation and the methodological approach we employed. We have revised the figure to enhance its clarity and make it more intuitive for the reader.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      At the very least, data showing the distribution of distances from the wall would be much easier to interpret for the reader.

      We understand the reviewer's concern that data showing the distribution of distances from the wall would be much easier to interpret for the reader. We initially consider that but came to the conclusion that this approach is not straightforward. For instance, if both spiders are positioned at the very front but in different corners, the distance to the panel would be very small, but the distance between the spiders would be large. Thus, using distances from the wall could misrepresent the actual spatial distribution between the spiders.

      (2) "Long-term social memory": It is not entirely clear what is meant by the authors when they say 'long-term social memory', though typically long-term memory refers to a form of a memory that requires protein synthesis.  

      To address this conceptually, we used the term "long-term social memory" to describe the spiders' ability to recognize and remember individual conspecifics over multiple experimental sessions. While social memory refers to the ability of an individual to recognize other individuals within a social context, long-term memory typically involves the retention of information over extended periods. Recognizing that the term “long-term social memory” is not commonly used, we have revised the manuscript to use the more standard term “long-term memory.”

      While the precise timing of memory formation varies across species and contexts, a general rule is that long-term memory should last for > 24 hours (e.g., Dreier et al 2007 Biol Letters). The longest time that spiders are apart in this trial setup is something like an hour. There is no basis to claim that spiders have long-term social memory as they are never asked to remember anyone after a long time apart.

      We appreciate the reviewer’s feedback regarding the term "long-term social memory." The statement "long-term memory should last for > 24 hours" is a generalisation in discussions about memory. It oversimplifies a more complex topic. That is, long-term memory is typically distinguished from short-term memory by its persistence over time, often lasting from hours to a lifetime. However, the exact duration that qualifies memory as "long-term" varies depending on the context, model species, and type of memory. In studies involved in synaptic plasticity (LTP), the object might indeed be to look at memory that persists for at least 24 hours as a criterion for long-term memory. In studies of cellular and/or molecular mechanisms where the stabilization and consolidation of memory traces over time are key areas of interest this 24-hour interval is very common. But, defining long-term memory strictly by a 24-hour duration is by no means universally accepted nor does it apply across all fields of study.

      To clarify, long-term memory is a process involving consolidation starting within minutes to hours after learning. Clearly, full consolidation can take longer, while memory persisting 24 hours is considered fully consolidated. But this does not mean that memory lasting less than 24 hours are not part of long-term memory. 

      In fact, Atkinson and Shiffrin (1969) proposed that information entering short-term memory remains there for about 20 to 30 seconds before being displaced due to space limitations. During this brief interval, initial encoding processes begin transferring information to long-term memory, establishing an initial memory trace. This transfer is not indicative of full consolidation but represents the initial "laying down" of the memory trace (encoding). In our study, the focal spider’s brain forms initial memory traces of the individuals it encounters. This process continues during the period of visual separation. Upon re-encountering the same individual a few minutes later, the spider accesses the initial memory trace stored in long-term memory. This trace is fragile and not fully consolidated. The re-encounter acts as a rehearsal, reactivating specific memory traces and potentially strengthening them through additional encoding processes, allowing the spider to recognize the individual even an hour later.

      According to Markowitsch (2013), initial encoding in long-term memory begins within seconds to minutes. It is also important to note that we argue for identity recognition rather than identity recall. Recognition involves correctly identifying a stimulus when it is presented again, while recall requires the volitional generation of information without an external stimulus. Thus, recall may rely on deeper forms of memory consolidation than recognition.

      Is protein synthesis required for long-term memory? 

      The role of protein synthesis in long-term memory has been extensively studied. According to Castellucci et al. (1978), explicit memory comprises a short-term phase that does not require protein synthesis and a long-term phase that does. Hebbian learning in its initial phase (early LTP) does not necessarily require protein synthesis. This phase involves the rapid strengthening of synapses through existing proteins and signaling pathways, such as the activation of NMDA receptors and the influx of Ca2+ ions. For the changes to persist (late LTP), protein synthesis is important. This phase involves the production of new proteins that contribute to long-term structural changes at the synapse, such as the growth of new synaptic connections or the stabilization of existing ones.

      This differentiation between the early and late phases of LTP highlights that long-term memory can begin forming without immediate protein synthesis. Our study focuses on this early phase of memory encoding, which involves the initial formation of memory traces that do not yet depend on protein synthesis. 

      It is however worth noting that recent research suggests that there is an early phase of protein synthesis (within minutes to hours) through the activation of immediate early genes (IEGs) and transcription factors. In this context, protein synthesis supports initial synaptic modifications. What the reviewer refers to is the consolidation phase (late phase), where continued synthesis of proteins induces structural changes at synapses, leading to the formation of new synaptic connections. In our study, it is plausible to assume that an early form of protein synthesis may contribute to stabilizing the initial memory traces during the encoding phase. However, whether or not protein synthesis occurred in our spiders is beyond the scope of this investigation and was not specifically addressed.

      The critical aspect of our study is that the information transitioned from short-term memory to long-term memory during an early encoding phase, allowing recall after an hour. Due to the inherent limitations and transient nature of the short-term memory, it is implausible for spiders to retain these memory representations solely within the short-term memory for such durations. Our findings suggest that the initial encoding processes were robust enough to transfer these experiences into long-term memory, where they were stabilized and could be accessed later. 

      In sum, it is important to note that long-term memory is a dynamic process, and while testing after 24 hours is a convention in some studies, this timing is arbitrary and not universally applicable to all contexts or species. The more critical consideration here is that we are dealing with a species where no prior evidence of long-term memory exists. Debating a 24-hour delay or the specifics of protein synthesis, while potentially interesting for future studies, detracts from the true significance of our findings. Our study is the first to show something akin to long-term memory representations in this species and this should remain in our focus.

      Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory. Psychological review, 76(2), 179. 

      Markowitsch, H. J. (2013). Memory and self–Neuroscientific landscapes. International Scholarly Research Notices, 2013(1), 176027.

      Castellucci, V. F., Carew, T. J., & Kandel, E. R., 1978. Cellular analysis of long-term habituation of the gill-withdrawal reflex of Aplysia californica. Science, 202(4374), 1306-1308.

      The odd phrasing of the 'long-term dishabutation' trial makes it seem that it is testing a longterm memory, but it is not. The spiders have never met. The fact that they are very habituated to one set of stimuli and then respond to a new stimulus is not evidence of long-term memory. To clearly test memory (which is the part really lacking from the design), the authors would need to show that spiders - upon the first instance of re-encountering a previously encountered individual are already 'habituated' to them but not to some other individuals. The current data suggest this may be the case, but it is just very hard to interpret given the design does not directly test the memory of individuals in a clear and unambiguous manner.

      While we appreciate the reviewer's feedback, we believe there may have been some misunderstanding regarding the term “long-term dishabituation.” The introduction of novel individuals at the end of Session 3 was not intended to test long-term memory by having spiders recognize these novel individuals. Instead, it aimed to investigate the nature of the habituation observed over the three sessions.

      The novel individuals introduced at the end of Session 3 serve the purpose to differentiate between general habituation (a decline in response due to repeated exposure to any stimuli) and specific habituation (recognition and reduced response to previously encountered individuals). The novel spiders have never been encountered before, so the focal spiders cannot have prior representations of them. Thus, the strong dishabituation response to these novel individuals indicates that the habituation observed earlier is not due to a general fatigue effect or loss of interest but rather a specific habituation effect to the familiar individuals. By showing such strong and increased response to novel individuals, the study demonstrates that the spiders' increasingly reduced responses in Sessions 2 and 3 are not merely due to a general decrease in responsiveness but suggest cognitive habituation. This cognitive habituation implies that the spiders remember the familiar individuals (as each of them occurred three times across the three sessions), a process that relies on long-term memory. Therefore, while the novel spiders themselves are not a direct test of long-term memory, the use of these novel spiders helps us infer that the habituation observed over the three sessions is indeed due to the formation of long-term memory traces.

      In other words, the organism detects and processes the novel stimulus as different from the habituated one. In our study, if a spider showed a strong dishabituation response to a novel individual introduced at the end of Session 3, it would indicate that the spider had formed specific representations of the individuals they encountered during the three sessions. These representations allow the spiders to recognise the novel individuals as different, leading to renewed interest and a stronger behavioural response. It is the absence of a prior representation for the novel spiders that triggers this dishabituation response. Since the novel spider does not match any stored representations of the previously encountered spiders, the focal spider responds more strongly.

      The introduction of novel individuals at the end of Session 3 helps clarify that the increasing habituation observed in Session 2 and 3 is specific to familiar individuals, indicating cognitive habituation. This supports the presence of long-term memory processes in the spiders, as they can distinguish between previously encountered individuals and new ones. The habituationdishabituation paradigm thus effectively demonstrates the spiders' ability to form and reactivate encoded memory traces, providing clear evidence of recognition memory. 

      For these reasons, we are convinced that our interpretation is accurate and hope this clarification renders the additional request for an entirely new experiment unnecessary.

      (3) Lack of a functional explanation and the emphasis on 'asociality': It is entirely plausible that recognition is a pleitropic byproduct of the overall visual cognition abilities in the spiders. 

      We agree with the reviewer that it is essential to consider the broader context of individual recognition and its potential adaptive significance. The possibility that recognition in jumping spiders could be a pleiotropic byproduct of their advanced visual cognition abilities is indeed a plausible explanation and has been discussed in our manuscript.

      However, the discussion that discounts territoriality as a potential explanation is not well laid out. First, many species that are 'asocial' nevertheless defend territories. It is perhaps best to say such species are not group living, but they have social lives because they encounter conspecifics and need to interact with them.

      The reviewer also correctly points out that many 'asocial' species still defend territories and have social interactions. Our use of the term 'asocial' was meant to indicate that jumping spiders do not live in cohesive social groups, but we acknowledge that they do have social lives in terms of interactions with conspecifics. It is more accurate to describe these spiders as non-groupliving, yet socially interactive species. A better term is “non-social” to refer to the jumping spider as a species that do not live in stable social groups and do not exhibit associated behaviours, such as cooperative behaviours. This also would imply that individuals still interact with conspecifics, especially in contexts like mating, territorial disputes or aggression. We, thus, change the term from “asocial” to “non-social” in the manuscript.  

      Indeed, there are many examples of solitary living species that show the dear enemy effect, a form of individual recognition, towards familiar territorial neighbors. The authors in this case note that territorial competition is mediated by the size or color of the chelicerae (seemingly a trait that could be used to distinguish among individuals). Apparently, because previous work has suggested that territorial disputes can be mediated by a trait in the absence of familiarity has led them to discount the possibility that keeping track of the local neighbors in a potentially cannibalistic species could be a sufficient functional reason. In any event, the current evidence presented certainly does not warrant discounting that hypothesis.

      The “dear enemy effect”, where solitary living species recognize and show reduced aggression towards familiar territorial neighbors, is a relevant consideration. This effect demonstrates that individual recognition can have significant functional implications even in species that are not group-living. We will elaborate on this effect in the revised manuscript to provide a more comprehensive discussion.

      The reviewer mentioned that territorial disputes can be mediated by the size or color of the chelicerae, potentially serving as a feature for individual recognition. Our intention was not to discount the role of such traits but to highlight that the level of identity recognition we observed represents subordinate classification. This is different from the basic-level classification, such as distinguishing between male and female based on chelicerae colour. While we acknowledge that colour can be an important feature for identity discrimination, our findings suggest that individual recognition in jumping spiders goes beyond simple colour differentiation. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated whether a salticid spider, Phidippus regius, recognizes other individuals of the same species. The authors placed each spider inside a container from which it could see another spider for 7 minutes, before having its view of the other spider occluded by an opaque barrier for 3 minutes. The spider was then either presented with the same individual again (habituation trial) or a different individual (dishabituation trial). The authors recorded the distance between the two spiders during each trial. In habituation trials, the spiders were predicted to spend more time further away from each other and, in dishabituation trials, the spiders were predicted to spend more time closer to each other. The results followed these predictions, and the authors then considered whether the spiders in habituation trials were generally fatigued instead of being habituated to the appearance of the other spider, which may have explained why they spent less time near the other individual. The authors presented the spiders with a different (novel) individual after a longer period of time (which they considered to be a long-term dishabituation trial), and found that the spiders switched to spending more time closer to the other individual again during this trial. This suggested that the spiders had recognized and had habituated to the individual that they had seen before and that they became dishabituated when they encountered a different individual.

      We appreciate the reviewer's detailed summary of our study. The reviewer's summary accurately captures the essence of our experimental design, predictions, and findings.

      Strengths:

      It is interesting to consider individual recognition by Phidippus regius. Other work on individual recognition by an invertebrate has been, for instance, known for a species of social wasp, but Phidippus regius is a different animal. Importantly and more specifically, P. regius is a salticid spider, and these spiders are known to have exceptional eyesight for animals of their size, potentially making them especially suitable for studies on individual recognition. In the current study, the results from experiments were consistent with the authors' predictions, suggesting that the spiders were recognizing each other by being habituated to individuals they had encountered before and by being dishabituated to individuals they had not encountered before. This is a good start in considering individual recognition by this species.

      We appreciate the reviewer's positive summary and acknowledgment of the strengths of our study. We would like to point out some more details: 

      While the exceptional eyesight of salticid spiders is indeed a significant factor, our study reaches deeper in terms of processing. We do not argue at the level of sensation rather than at the level of perception. Even more, identity recognition is a higher-level perceptual process. This distinction is crucial: we are not merely examining the spiders' sensory capabilities (such as good eye sight), but rather how their brains interpret and represent what they “see”. This involves a cognitive process where the sensory input (sensation) is processed and integrated into meaningful constructs (perception) and memorised in form of representations. 

      Our study also suggests that P. regius engages in “higher-level” perceptual processes. This most-likely involves complex representations of individual conspecifics, which in mammalian brains are associated with regions such as the central inferior temporal (cIT) and anterior inferior temporal (aIT) areas. We provide evidence that these spiders do not just sense visual stimuli but interpret and recognize individual identities, indicating sophisticated perceptual and cognitive abilities. In other words, the spiders do not merely respond to visual stimuli in a reflexive manner, but rather engage in sophisticated perceptual and cognitive processes that allow them to recognize and distinguish between individual identities. This indicates that the spiders are not simple Braitenberg vehicles reacting to stimuli, but are thinking organisms capable of complex mental representations. This resonates with current trends in animal cognition research, which increasingly recognize some level of consciousness and advanced cognitive abilities across a wide range of animal species. Moreover, this aligns with the growing interest and recognition of spider cognition, where research begins to provide evidence for the cognitive complexity and perceptual capabilities of these often underestimated creatures (Jackson and Cross, 2011). 

      Jackson, R. R., & Cross, F. R. (2011). Spider cognition. Advances in insect physiology, 41, 115174.

      Weaknesses:

      The experiments in this manuscript (habituation/dishabituation trials) are a good start for considering whether individuals of a salticid species recognize each other. I am left wondering, however, what features the spiders were specifically paying attention to when recognizing each other. The authors cited Sheehan and Tibbetts (2010) who stated that "Individual recognition requires individuals to uniquely identify their social partners based on phenotypic variation." Also, recognition was considered in a paper on another salticid by Tedore and Johnsen (2013).

      Tedore, C., & Johnsen, S. (2013). Pheromones exert top-down effects on visual recognition in the jumping spider Lyssomanes viridis. The Journal of Experimental Biology, 216, 1744-1756. doi: 10.1242/jeb.071118 

      In this elegant study, the authors presented spiders with manipulated images to find out what features matter to these spiders when recognizing individuals.

      The reviewer raises an important point regarding the specific features that Phidippus regius might be paying attention to when recognizing individual conspecifics. Our study indeed cited Sheehan and Tibbetts (2010) to highlight the importance of phenotypic variation in individual recognition. Additionally, we referenced the work by Tedore and Johnsen (2013) on visual recognition in another salticid species, which suggests that multiple sensory modalities, including visual and pheromonal cues, may be involved in the recognition process. While our current study focused on demonstrating that Phidippus regius can recognize individual conspecifics, we acknowledge that it does not specifically identify the phenotypic features involved in this recognition. 

      Part of the problem with using two living individuals in experiments is that the behavior of one individual can influence the behavior of the other, and this can bias the results.  

      We appreciate the reviewer's observation regarding the potential bias introduced by using two living individuals in experiments, as the behaviour of one individual can indeed influence the behaviour of the other. We shared this concern initially; however, the consistency of the data with our hypotheses suggests that this potential bias did not adversely affect the validity of our findings, rendering the concern largely illusory at least in the context of our study.

      We opted for the living-individual paradigm for the following reasons:

      There is a growing trend in ethological as well as animal cognition research towards more ecologically valid and biologically relevant settings, while simultaneously advancing the precision and quantification of the data collected. This is referred to as computational ethology.

      This approach advocates for assessing behaviour in environments that more closely resemble natural conditions, rather than relying solely on sterile and artificial experimental setups. The rationale is that such naturalistic arenas allow animals to exhibit a broader range of behaviours and interactions, providing a more accurate reflection of their cognitive and social abilities. The challenge, however, lies in navigating the inherent tradeoff between the strict control offered by standardized procedures and the ecological validity of more naturalistic interactions.

      By allowing two spiders to confront each other, we aimed to capture authentic behavioural responses while maintaining a degree of experimental standardization through the use of a controlled setup. Our approach ensures that the behaviours observed are not merely artifacts of an artificial environment but are representative of genuine social interactions. Also, to minimize potential biases arising from mutual behavioural influences, we employed a controlled and repeatable experimental environment. 

      We believe that the chosen approach provides a meaningful balance (in the above-mentioned trade-off) between ecological validity and experimental rigour. By combining a standardized environment with the naturalistic interaction of real spiders, we ensured that our findings are both scientifically robust and biologically relevant.

      However, this issue can be readily avoided because salticids are well known, for example, to be highly responsive to lures (e.g. dead prey glued in lifelike posture onto cork disks) and to computer animation. 

      While it is true that salticid spiders are responsive to lures and computer animations, we carefully considered the most appropriate and ecologically valid approach for our study. Our aim was to capture genuine behavioural patterns in a context that closely mimics the natural encounters these spiders experience.

      Additionally, creating comparable video stimuli of spiders presents its own set of challenges: Video recordings or computer animations may not fully capture the nuanced behaviours and subtle variations that occur during real-life interactions. There is also a risk that such stimuli could be perceived differently by the spiders, potentially introducing new biases or confounding factors.

      Scientific progress is not made by merely relying on previously established paradigms, especially when they may not be suitable for the specific context of a study. While alternative methods like lures or computer animations can be valuable in certain situations, our approach was deliberately chosen to best capture the naturalistic and interactive aspects of spider behaviour.

      These methods have already been successful and helpful for standardizing the different stimuli presented during many different experiments for many different salticid spiders, and they would be helpful for better understanding how Phidippus regius might recognize another individual on the basis of phenotypic variation. There are all sorts of ways in which a salticid might recognize another individual. Differences in face or body structure, or body size, or all of these, might have an important role in recognition, but we won't know what these are using the current methods alone. Also, I didn't see any details about whether body size was standardized in the current manuscript.

      As mentioned previously, the goal of our study was to demonstrate that identity recognition occurs in spiders. This alone is of significant importance, as it challenges existing assumptions about the cognitive capabilities of small-brained animals. We did not aim at providing a proximate explanation (mechanism) for identity recognition in spiders.

      The problem with what the reviewer suggested is this: As long as we do not have conclusive evidence that spiders recognize individual conspecifics, any attempt to design and manipulate stimuli would lack a solid foundation. Without understanding whether spiders have this capability, we cannot make informed decisions about which features or characteristics to manipulate in stimuli. In other words, this uncertainty means we lack a starting point for our assumptions, making it nearly impossible to create stimuli that would be useful or relevant in testing identity recognition.

      Additionally, it is nearly impossible to artificially generate a stimulus set that encompasses the natural variance in features that spiders use for visual individuation. There is no guarantee that artificial stimuli, such as lures or computer animations, would capture the relevant features that spiders use in natural interactions.

      In other words, the question how Phidippus regius recognizes another individual will be subject of further investigation. In this study, we focus on whether or not they individuate others.  

      For another perspective, my thoughts turn to a paper by Cross et al.

      Cross, F. R., Jackson, R. R., & Taylor, L. A. (2020). Influence of seeing a red face during the male-male encounters of mosquito-specialist spiders. Learning & Behavior, 48, 104-112. doi: 10.3758/s13420-020-00411-y

      These authors found that males of Evarcha culicivora, another salticid species that is known to have a red face, become less responsive to their own mirror images after having their faces painted with black eyeliner than if their faces remained red. In all instances, the spiders only saw their own mirror images and never another spider, and these results cannot be interpreted on the basis of habituation/dishabituation because the spiders were not responding differently when they simply saw their mirror image again. Instead, it was specifically the change to the spider's face which resulted in a change of behavior. The findings from this paper and from Tedore and Johnsen can help give us additional perspectives that the authors might like to consider. On the whole, I would like the authors to further consider the features that P. regius might use to discern and recognize another individual.

      We acknowledge that identifying the specific features used by P. regius for identity recognition is a valuable direction for future research. However, we must emphasise that without first establishing whether spiders are capable of individuating each other, it would be premature and challenging to determine the specific features they rely on for this process. A lack of response to certain features could either suggest that those features are not relevant or, more critically, that the spider does not recognize individual identities at all. Thus, our initial focus on demonstrating identity recognition is essential before delving into the specific cues or characteristics involved.

      While the call for addressing the proximate causation of identity recognition in jumping spiders is valid, we need to also reiterate the significance of our findings and why they stand on their own merit:

      Our study demonstrates for the first time that Phidippus regius can systematically individuate conspecifics, showing habituation within short intervals (10 minutes) and over longer intervals (1 hour). This behaviour is not due to general habituation or physical fatigue but is a result of cognitive habituation, as illustrated by the spiders' response to novel individuals introduced after repeated encounters with familiarized ones. 

      What are the implications of this? Our findings indicate that these spiders possess long-term memory and form representations that can be reactivated after an hour. While this is most-likely not fully consolidated memory formation (see our reply to Reviewer 1), it represents an encoded long-term memory. This implies that small-brained animals can remember, represent, and potentially build internal mental images, which are crucial for sophisticated cognitive processing. 

      Reviewer #3 (Public Review):

      Summary:

      Jumping spiders (family Salticidae) have extraordinarily good eyesight, but little is known about how sensitive these small animals might be to the identity of other individuals that they see. Here, experiments were carried out using Phidippus regius, a salticid spider from North America. There were three steps in the experiments; first, a spider could see another spider; then its view of the other spider was blocked; and then either the same or a different individual spider came into view. Whether it was the same or a different individual that came into view in the third step had a significant effect on how close together or far apart the spiders positioned themselves. It has been demonstrated before that salticids can discriminate between familiar and unfamiliar individuals while relying on chemical cues, but this new research on P. regius provides the first experimental evidence that a spider can discriminate by sight between familiar and unfamiliar individuals.

      Clark RJ, Jackson RR (1995) Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology and Evolution 7:185-190

      We appreciate the reviewer's comprehensive summary and acknowledgment of the significance of our findings.

      Strengths:

      This work is a useful step toward a fuller understanding of the perceptual and cognitive capacities of spiders and other animals with small nervous systems. By providing experimental evidence for a conclusion that a spider can, by sight, discriminate between familiar and unfamiliar individuals, this research will be an important milestone. We can anticipate a substantial influence on future research.

      We appreciate the reviewer’s recognition of the strengths and significance of our study. We are pleased that the reviewer considers our research an important milestone. Our findings indeed suggest that even animals with relatively simple nervous systems can perform complex cognitive tasks, which has substantial implications for the broader study of animal cognition.

      As pointed out by the reviewer, we also hope that our study will have a substantial influence on future research. By establishing a methodology and providing clear evidence of visual discrimination, we aim to encourage further investigations into the cognitive abilities of jumping spiders and other arthropods. Future research can build on our findings to explore the specific visual cues and mechanisms involved in individual recognition (as Reviewer 2 pointed out), as well as the ecological and evolutionary implications of these abilities.

      Weaknesses:

      (1) The conclusions should be stated more carefully.

      We agree that clarity in our conclusions is paramount. We will revise the manuscript to ensure that our conclusions are presented with precision and appropriately reflect the data. Specifically, we will emphasize the evidence supporting our findings of visual individual recognition and clarify the limitations and scope of our conclusions to avoid any potential overstatements.

      (2) It is not clearly the case that the experimental methods are based on 'habituation (learning to ignore; learning not to respond). Saying 'habituation' seems to imply that certain distances are instances of responding and other distances are instances of not responding but, as a reasonable alternative, we might call distance in all instances a response. However, whether all distances are responses or not is a distracting issue because being based on habituation is not a necessity.

      We appreciate the reviewer's feedback and understand the concern regarding the use of the term 'habituation.' We agree that all distances maintained by the spiders are active responses and reflect their behavioral decisions based on perception and recognition of the other individual. We recognize that all distances are responses and interpret these as the spiders’ “active decisions”, modulated by their recognition of the same or different individuals. 

      The terms 'habituation' and 'dishabituation' are used to label trial types for ease of discussion and to describe the expected behavioural modulation.

      (3) Besides data related to distances, other data might have been useful. For example, salticids are especially well known for the way they communicate using distinctive visual displays and, unlike distance, displaying is a discrete, unambiguous response.

      We appreciate the reviewer’s suggestion to incorporate data on visual displays, which are indeed well-known communication methods among salticids. We agree that visual displays are discrete and unambiguous responses that could provide additional insights into the spiders' recognition abilities.

      Our primary focus on distance measurements was driven by the need to quantify behaviour in a continuous and scalable manner, that is, how spiders modulate their proximity based on familiarity with other individuals.

      We acknowledge the potential value of including visual display measurments; however, in our study, we aimed to establish a foundational understanding of recognition behaviour through proximity measures first. Also, capturing diplays requires a different experimental paradigm, where the displays are clearly visible and analyzable. 

      (4) Methods more aligned with salticids having extraordinarily good eyesight would be useful. For example, with salticids, standardising and manipulating stimuli in experiments can be achieved by using mounts, video playback, and computer-generated animation.

      There is no doubt that salticids have excellent eyesight. However, our study focuses on higherlevel perceptual processes that require complex brain analysis, not just visual acuity. The goal was to investigate whether spiders can individuate and recognize conspecifics, which involves interpreting visual information and forming long-term representations.

      Clearly, methods like video playback and computer animations are useful in controlled settings, where the spider is mounted, but they pose challenges for our specific research question. At this stage of research, we lack precise knowledge of which visual features are critical for individual recognition in spiders, making it difficult to design effective artificial stimuli. 

      Our primary objective was to determine if spiders can individuate others. Before exploring the proximate mechanisms of how they individuate others, it was essential to establish that they have this capability. This foundational question needed to be addressed before delving into more detailed mechanistic studies.

      (5) An asocial-versus-social distinction is too imprecise, and it may have been emphasised too much. With P. regius, irrespective of whether we use the label asocial or social, the important question pertains to the frequency of encounters between the same individuals and the consequences of these encounters.

      Our intent was to convey that P. regius does not live in cohesive social groups but does engage in individual interactions that can have significant behavioral consequences. We will revise the manuscript to reduce the emphasis on the asocial-versus-social distinction. As discussed above, we also will change the term “asocial” to “non-social” in the manuscript.

      (6) Hypotheses related to not-so-strictly adaptive factors are discussed and these hypotheses are interesting, but these considerations are not necessarily incompatible with more strictly adaptive influences being relevant as well.

      We appreciate the reviewer's observation regarding the discussion of hypotheses related to notso-strictly adaptive factors. We agree that our considerations of these factors do not preclude the relevance of more strictly adaptive influences.

      We will revise the manuscript to explicitly discuss how our findings can be interpreted in the context of adaptive hypotheses. This will provide a more comprehensive understanding of the evolutionary significance of individual recognition in P. regius. Modifications were made in the Discussion section.

      In the following, we comment on issues not mentioned in the “public reviews” section.

      Reviewer #1 (Recommendations For The Authors):

      (1) I would suggest conducting experiments that actually test for recognition memory, as this seems to be a claim that the authors make. Following the ant studies by Dreier cited in this manuscript would be sufficient to test for memory. Given the relative simplicity of the measures being taken (location of spiders), this would seem like a very simple addition that would provide a much stronger and more readily interpreted dataset.

      As previously explained in our detailed responses (public reviews), we believe that the current design effectively addresses the questions at hand. Our approach, using a habituationdishabituation paradigm, provides robust evidence for recognition memory within the framework of early long-term memory.

      Additionally, we have explained why using the distance to the panel as a measure is not appropriate in this context. Specifically, using such a measure can misrepresent the actual interests of the spiders in each other.

      While we acknowledge the merits of the ant studies by Dreier, our current design allows for a detailed understanding of the spiders' recognition capabilities over short (10 min) and slightly longer intervals (up to one hour). This is sufficient to demonstrate the presence of recognition memory without the necessity of further experiments. The observed patterns of habituation and dishabituation responses in our study clearly indicate that the spiders can distinguish between familiar and novel individuals, which supports our claims.

      Given these points, we respectfully maintain that the current data and experimental design are adequate to support our findings and provide a comprehensive understanding of recognition memory in Phidippus regius.

      (2) The writing is rather impenetrable. The results explain the basic finding in terms of statistical variables rather than simply stating the results. A clear and straightforward statement such as 'the spiders showed reduced interest upon habituation trials, indicating xyz' (and then citing the stats) is preferable to the introduction of results as a statistical model. The statistical model is a means of assessing the results. It is not the result. Describe the data.

      We tried to improve that in the current version.

      (3) Showing more straightforward data such as distance from the joint barrier would make the paper much easier to understand.

      This paper has been on bioRxiv for some time and my guess is that it has ended up here because it is having trouble in review. Collecting new data that more directly test the question at hand, presenting the data in a more direct manner, and more critically evaluating your own claims will improve the paper.

      While it is true that the paper has been on bioRxiv for a while, this submission marks the first instance where it has undergone peer review. Prior to this, the manuscript was submitted to other journals but was not reviewed.

      We hope the explanations provided in the “public reviews” section, along with the revised manuscript, sufficiently clarify our study and its conclusions. We believe the current data robustly address the research questions, and as outlined in our detailed responses, we have critically evaluated our claims and presented the data clearly. Given these clarifications, we do not see the necessity for new experiments as the existing data adequately support our findings. We trust that these revisions and explanations will clarify any misunderstandings.

      I am totally sold that the spiders are paying attention to identity at some level. The key now is to understand what that actually means in terms of recognition (i.e. memory of individuals) not just habituation.

      We appreciate the reviewer’s emphasis on the distinction between habituation and memorybased individual recognition. As detailed in the preceding discussion, we have taken great care to clarify how our paradigm distinguishes simple habituation effects from true memory for individual identity. We trust that the preceding sections make clear how our findings go beyond simple habituation to establish genuine individual recognition.

      Reviewer #2 (Recommendations For The Authors):

      Aside from the comments in the public review, I have some additional comments that the authors may wish to consider.

      Numerous times in the manuscript, the authors mentioned that recognizing individuals requires recognition memory. This seems rather obvious, and I wonder if the authors could instead be more precise about what they mean by 'recognition memory'?

      Recognition memory refers to the cognitive ability to identify a previously encountered stimulus, an individual, or events as familiar. It involves both encoding and retrieval processes, allowing an organism to distinguish between novel and familiar stimuli. This form of memory is a fundamental component of cognitive functioning and is supported by neural mechanisms that, in the mammal brain, involve the hippocampus and other brain regions associated with memory processing. 

      In our study, we aimed to test whether Phidippus regius recognizes conspecifics, or, in other words, utilizes recognition memory to distinguish between familiar and unfamiliar conspecifics. With the habituation - dishabituation paradigm, we assessed the spiders' ability to recognize previously encountered individuals and demonstrate memory retention over short (10 min) and extended periods (1 hour).

      Encoding: In the initial trial, when a spider encounters an individual for the first time (Figure 1A, “Baseline” or “Dishabituation” for every following trial), it encodes the visual information related to that specific individual. This encoding process involves creating a memory trace of the individual's phenotypic characteristics.

      Storage: During the visual separation period, this encoded information is stored in the spider's memory system. The memory trace, though initially fragile, starts to stabilize over the separation period. Whether or not this leads to some form of consolidated memory remains unaddressed. This aspect was highlighted by the first reviewer, but our focus is on the early process rather than on late processes, such as consolidation. 

      Retrieval: In the subsequent trial, when the same individual is presented again, the spider retrieves the stored memory trace. If the spider recognizes the individual, its behaviour reflects habituation, indicating memory retrieval. Conversely, when a novel individual is introduced, the lack of stored memory trace triggers a different behavioural response, indicating dishabituation. This differential response demonstrates the spider's ability to distinguish between familiar and unfamiliar individuals. This differential response is also key to understanding the nature of habituation over the three sessions, as introducing novel spiders leads to a significant dishabituation response after the three sessions in Experiment 2.

      In Line 39, the authors state that they used "a naturalistic experimental procedure". I would like to know how this experiment is 'naturalistic'. The authors' use of an arena does not appear naturalistic, or something the spiders would encounter in the wild.

      We appreciate the reviewer's comment regarding our use of the term 'naturalistic'. We acknowledge that the experimental arena itself does not replicate the conditions found in the wild. Our approach aimed to incorporate elements of natural behaviour by allowing two spiders to freely move and interact within the controlled environment. This approach aligns with principles from computational ethology, which seeks to balance the trade-off between repeatability/standardization and observing free, naturalistic behaviour. By using this paradigm, we aimed to capture behaviours that closely resemble those exhibited in their natural habitat. This setup was chosen to balance the need for ecological validity with the requirements for standardized data collection. 

      Also, and this point has been raised above, by observing the spiders' natural interactions without restraining them or using artificial stimuli like computer animations, we aimed to capture behaviours that closely resemble their natural responses to conspecifics. In contrast, we would not have any clear expectations regarding responses to arbitrarily designed artificial stimuli. This method provides a more ecologically valid assessment of the spiders' recognition abilities.

      There are a few details wrong in Line 41. 'Salticidae' is a family name and shouldn't be italicized. Also, the sentence suggests that there is a spider called a 'jumping spider' in the family Salticidae, which is technically called Phidippus regius. To clarify, all spiders in the family Salticidae are known as jumping spiders, and one species of jumping spiders is called Phidippus regius.

      We will correct this in the manuscript to accurately reflect the classification and terminology. Thank you for pointing out these inaccuracies.

      A manuscript on individual recognition by a salticid should include citations to earlier papers that have already considered individual recognition by salticids. As well as the paper by Tedore and Johnsen (2013), the authors should be aware of the following papers.

      Clark, R. J., & Jackson, R. R. (1994). Portia labiata, a cannibalistic jumping spider, discriminates between its own and foreign egg sacs. International Journal of Comparative Psychology, 7, 3843.

      Clark, R. J., & Jackson, R. R. (1994). Self-recognition in a jumping spider: Portia labiata females discriminate between their own draglines and those of conspecifics. Ethology, Ecology & Evolution, 6, 371-375.

      Clark, R. J., & Jackson, R. R. (1995). Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology & Evolution, 7, 185-190.

      We appreciate the reviewer's suggestion to include citations to these earlier papers. We will add the recommended references to provide a comprehensive background.

      In Line 203, I would not consider "interaction with human caretakers and experimenters" to be a form of behavioral enrichment. This kind of interaction has the potential to be stressful for the spiders, rather than enriching. I suggest deleting that part of the sentence.

      We appreciate the reviewer's feedback and agree that interactions with human caretakers and experimenters might not always be enriching and could potentially be stressful for the spiders. We will remove that part of the sentence to better reflect the intended meaning.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful and interesting, and I predict that it will be influential, but more attention should be given to stating the objective and conclusion accurately and clearly. As I understand it, the objective was to investigate a specific hypothesis: that Phidippus regius has a capacity to identify conspecific individuals as particular individuals (i.e., individual identification). Strong evidence supporting this hypothesis being true would be especially remarkable because I am unaware of any published work having shown evidence of a spider expressing this specific perceptual capacity.

      Thank you for recognizing the significance and potential influence of our manuscript. We agree that clearly stating the objective and conclusions is essential for conveying the importance of our findings. Our results provide robust evidence supporting the hypothesis that Phidippus regius can recognize and remember individual conspecifics. We will revise the manuscript to more clearly highlight the objective and our conclusions, emphasizing the novel evidence for individual identification in these spiders.

      Based on reading this manuscript and based on my understanding of the meaning of 'individual identification', it seems to me that the hypothesis that P. regius has a capacity for individual identification might or might not be true, and the experiments in this manuscript cannot tell us which is the case. 

      We respectfully disagree with the reviewer's assessment. Our experiments were carefully designed to test whether P. regius has the capacity for individual identification, and our results provide clear evidence supporting this hypothesis. The systematic differences in the spiders' behaviour when encountering familiar versus novel individuals indicate that they can recognize and remember specific conspecifics. We will revise the manuscript to ensure that the evidence and conclusions are stated more clearly to address any potential misunderstandings.

      Determining which is the case would have required research that made better use of the literature, and displayed more critical thinking. addressed credible alternative hypotheses and adopted experimental methods that focused more strictly on individual identification. 

      The distinction between whether P. regius has a capacity for individual identification is not ambiguous in our study. Our findings clearly demonstrate this capacity through systematic behavioural responses to familiar versus novel individuals. As pointed out above, the experimental procedure might be complex, but results are systematic despite this complexity. The experiments were designed to directly address the hypothesis of individual identification, and the data robustly support our conclusions. While considering alternative hypotheses is important, the results we present provide a coherent and compelling case for individual identification in P. regius. We will ensure our manuscript clearly articulates this narrative and the supporting evidence.

      At the same time, I also appreciate that asking for all of that at once would be asking for too much. As I see it, this manuscript tells us about research that moves us closer to a clear focus on the details and questions that will matter in the context of considering a hypothesis that is strictly about individual identification. More importantly, I think this research reveals a perceptual capacity that is remarkable even if it is not strictly a capacity for individual identification.

      We understand the desire for a more focused exploration of individual identification with paradigms more familiar to the reviewers and we acknowledge that further detailed studies could enhance our understanding of this capacity. However, our findings do indeed suggest that Phidippus regius exhibits a remarkable perceptual capacity for recognizing and remembering individual conspecifics. The systematic behavioural responses observed in our experiments strongly indicate that these spiders possess the ability for individual recognition. While our study may not have explored every potential detail (e.g. which features are most crucial for the memory matching processes), the evidence we present robustly supports the conclusion of individual identification.

      We acknowledge that it is indeed valuable to follow established paradigms and build upon the frameworks that have been used successfully in similar species and studies. These paradigms provide a solid foundation for scientific inquiry and allow for comparability across different research efforts. However, it is equally important to acknowledge and explore alternative approaches. Scientific progress is driven not only by replication but also by innovation. By employing new paradigms, researchers can uncover novel insights and push the boundaries of current understanding. The paradigm we used in our study, while different from those traditionally applied to similar research, is not an invention but a well-established method in various domains. It represents an innovative application in the context of our specific research questions, offering a fresh perspective and contributing to the advancement of the field.

      As I understand it, 'individual identification' means identifying another individual as being a particular individual instead of a member of a larger set (or 'class') of individuals. An 'individual' is a set containing a single individual. Interesting examples of identifying members of larger sets include discriminating between familiar and unfamiliar individuals. In the context of the specific experiments in this manuscript, familiar-unfamiliar discrimination means discriminating between recently-seen and not-so-recently-seen individuals. My impression is that the experiments in this manuscript have given us a basis for concluding that P. regius has a capacity for familiarunfamiliar (recently seen versus not so recently seen) discrimination. If this is the case, then I think this is the conclusion that should be emphasised. This would be an important conclusion.

      I appreciate that, depending on how we use the words, familiar-unfamiliar discrimination might be construed as being 'individual identification'. An individual is identified as 'the individual recently seen'. As a casual way of speaking, it can be reasonable to call this 'individual identification'. The difficulty comes from the way calling this 'individual identification' can suggest something more than has been demonstrated. To navigate through this difficulty, we need an expression to use for a capacity that goes beyond familiar-unfamiliar discrimination. In the context of this manuscript about P. regius, we need expressions that will make it easy to consider two things. One of these things is a capacity for familiar-unfamiliar discrimination. The other is the capacity to identify another individual as being a particular individual.

      We appreciate the reviewer's insightful comments on the distinction between familiar-unfamiliar discrimination and individual identity recognition. Our study indeed focuses on demonstrating that Phidippus regius can recognize and remember individual conspecifics, providing evidence for individual identity recognition.

      Two specific behavioural hallmarks that speak against familiarity recognition:

      First, the significant dishabituation response to novel individuals introduced after multiple sessions underscores the specificity of the recognition. This shows that the spiders' habituation is not general but specific to familiar individuals. 

      Second, the pattern of habituation over the sessions provides further evidence: We observed the strongest systematic modulation in Session 1, a reduced modulation in Session 2, and a further diminished effect in Session 3. If the spiders were only responding based on familiarity, we would expect a more drastic decrease, resulting in a washed-out non-effect by Session 2. However, the continued, though diminishing, differentiation between habituation and dishabituation trials across sessions indicates that the spiders are not merely responding to a general sense of familiarity but are engaging in individual recognition. In other words, the spiders' ability to distinguish between familiar and novel individuals even after repeated exposures suggests that they are not just recognizing a familiar status but are identifying specific individuals.

      Things people do might help clarify what this means. People have an extraordinary capacity for identifying other individuals as particular individuals. Often this is based on giving each other names. Imagine we are letting somebody see photographs and asking them to identify who they see. The answer might be, 'somebody familiar' or 'somebody I saw recently' (familiar-unfamiliar discrimination); or the question might be answered by naming a particular individual (individual identification).

      We appreciate the reviewer's efforts to clarify the distinction between familiar-unfamiliar discrimination and individual recognition using human examples. However, we believe this comparison might not fully capture the complexity of individual recognition in non-human animals. 

      Familiarity recognition refers to recognizing someone as having been seen or encountered before without necessarily distinguishing them from others in the same category. On the other hand, identity recognition involves recognizing a specific individual based on unique characteristics (or features). In humans, this often involves naming, but more critically, like in most animals, it involves recognizing visual, auditory, chemical or other sensory cues. In animals, including spiders, individual recognition does not involve and let alone rely on naming but on the ability to distinguish between individuals based on sensory cues and learnt associations. This is a valid and well-documented form of individual recognition across many species.

      Individual recognition does not require naming or the assignment of a referential label. Animals can distinguish between specific individuals based on previously perceived and stored features and characteristics. Naming is the exception rather than the rule in the animal kingdom. Only a few species, such as humans and maybe certain cetaceans, use naming for identity recognition. This is an evolutionary rarity and not the standard mechanism for individual recognition, which primarily relies on sensory cues and learnt associations. Furthermore, the mechanism of recognition in both humans and animals involves a complex process of matching incoming sensory and perceptual information with stored memory representations. Naming is merely a tool for communication, allowing us to convey which individual we are referring to. It is not the mechanism by which recognition occurs. The core of individual recognition is this matching process, where sensory cues (visual, auditory, chemical, etc.) are compared to memory traces of previously encountered individuals. Therefore, the suggestion that individual identification necessitates naming misrepresents the actual cognitive processes involved. 

      We can think of individual identification being based on more fine-grained discrimination (with this, set size = one), with familiar-unfamiliar discrimination being more coarse-grained discrimination (with this, set size can be more than one). Restricting the expression 'individual identification' to instances of having the capacity to identify another individual as being a particular individual (set size = one) is better aligned with normal usage of this expression.

      Absolutely, the distinction between fine-grained and coarse-grained discrimination aligns with the concept of different category levels, such as basic and subordinate levels, put forward by Eleanor Rosch (e.g. Rosch, 1973). In the context of individual recognition, fine-grained discrimination (where set size = one) refers to the ability to identify a specific individual based on unique characteristics. This is referred to as subordinate level categorization. Coarse-grained discrimination (where set size can be more than one) refers to recognizing someone as familiar without distinguishing them from others in the same category, more similar to basic level categorization. 

      Rosch, E.H. (1973). "Natural categories". Cognitive Psychology. 4 (3): 328–50.doi:10.1016/0010-0285(73)90017-0

      There is a strong emphasis on an asocial-social distinction in this manuscript. It seems to me that this needs to be focused more clearly on the specific factors that would make a capacity for individual identification beneficial. In the context of this manuscript, the term 'social' may suggest too much. It seems to me that the issue that matters the most is whether individuals live in situations where important encounters occur frequently between the same individuals. Irrespective of whether other notions of the meaning of 'social' also apply, there are salticids that live in aggregated situations where they frequently have important encounters with each other. This is the case with Phidippus regius in the field in Florida, but I realize that there may not be much published information about the natural history of this salticid. Even so, there are salticids to which the word 'social' has been applied in published literature.

      We appreciate the reviewer's comments on the asocial-social distinction and we agree that this terminology might need refinement. Our intent was not to categorize Phidippus regius rigidly but to explore the contextual factors influencing the benefits of individual identification. The critical factor in our study is indeed the frequency and importance of encounters between individuals, rather than a broader social structure. We will revise the manuscript to reflect this more nuanced perspective, focusing on the ecological validity of our experimental design and the adaptive significance of individual recognition in environments where repeated encounters can occur.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging.

      Thank you so much for your review and comments.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion).

      Thank you so much for your review and comments. As the reviewer pointed out, LC3-II/LC3- I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, Figure 8C, Figure 9C), these results collectively suggest that autophagy is lowered.

      As the reviewer pointed out and we described in v2, milton knockdown, eIF2β overexpression and heterozygosity increase LC3-I abundance. We do not know how these conditions increase LC3-I at this moment. We will investigate the cause of the increase in LC3-I by milton knockdown and how it contribute to impaired autophagy. We added this discussion as:

      Lines 388-393; ‘Our results also suggest that milton knockdown and overexpression of eIF2β affect autophagy via increased LC3-I abundance (Figures 2 and 7), suggesting an unconventional mechanism of autophagy suppression. To our knowledge, the roles of eIF2β in aging and autophagy independent of ISR have not been reported. Our results revealed a novel function of eIF2β to maintain proteostasis during aging, while further investigation is required to elucidate underlying mechanisms.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. Plots of the 21-day-old proteome results was included in the main figure (Figure 4C) in v2. In this revision, we further analyzed age-dependent changes of eIF2β levels by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that milton knockdown accelerates age-dependent changes in eIF2β. We added these results and discussion in the revised manuscript.

      Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age- dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      We included these sentences in the discussion section:

      Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’


      With our new data, we revised some of our responses to the first round of reviewer’s comments.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 246-249 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We analyzed age-dependent changes of eIF2β levels in more detail by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that Milton knockdown accelerates age-dependent changes in eIF2β.. We added these results and discussion in the revised manuscript.

      NEW TEXT: Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figures 2A, 2D, 3D, 7B (Figure 6B in the previous version), and 8B (Figure 7B in the previous version). As the reviewer pointed out, LC3-II/LC3-I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 7C (Figure 6C in the previous version), 8C (Figure 7C in the previous version)), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects31,32. p62 is an autophagy substrate, and its accumulation suggests autophagic defects31,32. We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100- soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190: 'At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 202-203: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 279-285: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 7A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 7C, 4-fold vs. control, p <0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 7C, 2-fold vs. control, p\= 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X- 100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      NEW TEXT: Thank you for pointing it out. We included plots of the 21-day-old proteome results as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are lower in milton knockdown background at the 21-day-old compared to the control. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age-dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      NEW TEXT: We included these sentences in the discussion section:

      NEW TEXT: Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 5D and E (Figure 4J and K in the previous version)), and it also lowered translation (Figure 6 (Figure 5 in the previous version)); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 374-384: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 6 (Figure 5 in the previous version)).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 6 (Figure 5 in the previous version)).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 355-357: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 263-268 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation36. To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Lines 374-384 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 7D) and revised the text as below:

      Lines 286-289: ‘Since milton knockdown reduced the p-eIF2α level (Figure 5E), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 7D and E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 5B). The result section was revised as below;

      Lines 246-249: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We also analyzed age-dependent changes of eIF2β by western blotting and found that eIF2β increased during aging until 49-day-old. We included this result as Figure 4G and added these sentences in the result section:

      NEW TEXT: Line 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima- Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 136-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract24. Although there was a tendency for increased ubiquitin- positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      NEW TEXT: Thank you for your insightful comment. As you mentioned, the accumulation of ubiquitinated proteins significantly increases only in young flies. Age-related pathways, such as immune responses, are highlighted in young milton knockdown flies but not in the aged flies. Our new result indicates that eIF2β increases during aging in control flies (included as Figure 4G in the revised manuscript), and upregulation of eIF2β in milton knockdown is only observed at a young age. These results suggest that milton knockdown does not increase the magnitude of age-dependent changes but accelerates their onset. We revised the text to include those points as follows:

      NEW TEXT: Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day- old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4 and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported24, the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate the reviewer's comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text to include this discussion as:

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 369-371: ‘Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 162-164: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 440-443: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 689-691, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out a statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduced polysome (Figure 6B (Figure 5B in the previous version)). We revised the manuscript as below;

      Lines 267-268: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2β overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2β overexpression, the difference between control and eIF2β overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 292-297: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age- dependent manner24. eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 8C (Figure 7C in the previous version)). We revised the text as below:

      Lines 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 443-447: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. Author response:

      Reviewer #1, Comment (1): Terminology

      We fully acknowledge the importance of terminological consistency and will align our usage with established literature. Specifically, we will revise as follows, 

      (1) Replace “sinusoidal analysis” with either “sinusoidal modulation” (Doeller et al., 2010; Bao et al., 2019; Raithel et al., 2023) or “GLM with sinusoidal (cos/sin) regressors” (Constantinescu et al., 2016). 

      (2) Replace “1D directional domain” with either “angular domain of movement directions (0–360°)” or “directional modulation analysis”.

      Reviewer #1, Comment (2): Spectral analysis and 3-fold periodicity

      We agree that the presentation of our spectral analysis and the theoretical motivation underlying our expectation of a three-fold periodicity within hippocampal data requires further clarification.

      In our revised manuscript, we will:<br /> (1) Clearly articulate the theoretical motivation for anticipating a three-fold signal, explicitly linking it to the known hexagonal grid structure encoded by the entorhinal cortex.

      (2) Clarify our methodological rationale for using Fourier analysis (FFT).

      a) FFT allows unbiased exploration of multiple candidate periodicities (e.g., 3–7-fold) without predefined assumptions.

      b) FFT results cross-validate our sinusoidal modulation results, providing complementary evidence supporting the 6-fold periodicity in EC and 3-fold periodicity in HPC.

      c) FFT uniquely facilitates analysis of periodicities in behavioral performance data, which is not feasible via standard sinusoidal GLM approaches. This consistency allows us to directly compare periodicities across neural and behavioral data.

      (3) Further, we will expand our discussion to provide:

      a) A deeper interpretation of potential biological bases for the observed hippocampal three-fold periodicity.

      b) A careful examination of alternative explanations within existing hippocampal modeling frameworks.

      Reference:

      Doeller, C. F., Barry, C., & Burgess, N. (2010). Evidence for grid cells in a human memory network. Nature, 463(7281), 657-661.

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468.

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like neural representations support olfactory navigation of a two-dimensional odor space. Neuron, 102(5), 1066-1075.

      Raithel, C. U., Miller, A. J., Epstein, R. A., Kahnt, T., & Gottfried, J. A. (2023). Recruitment of grid-like responses in human entorhinal and piriform cortices by odor landmark-based navigation. Current Biology, 33(17), 3561-3570

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am impressed with the thoroughness with which the authors addressed my concerns. I don't have any further concerns and think that this paper makes an interesting and significant contribution to our understanding of VWM. I would only suggest adding citations to the newly added paragraph where the authors state "It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance." They could cite work by Bettencourt and Xu, 2016; and Sheremata, Somers, and Shomstein (2018).

      We thank the reviewer for the positive feedback. We have now cited the referenced work in the manuscript (Page. 19, Line 371).

      Reviewer #2 (Public review):

      Overall, I think that the authors' revision has addressed most, if not all, of my major concerns noted in my previous comments. The results appear convincing and I do not have additional comments.

      We thank the reviewer for the positive feedback and are pleased that the revision addressed the major concerns.

      Reviewer #3 (Public review):

      (1) The authors addressed most of my previous concerns and provided additional data analysis. They conducted further analyses to demonstrate that the observed changes in network communication are associated with behavioral RTs, supporting the idea that the impulse-driven sensory-like template enhances informational connectivity between sensory and frontoparietal areas, and relates to behavior.

      We are pleased that the revision addressed the major concerns.

      (2) I would like to further clarify my previous points regarding the definition of the two types of templates and the evidence for their coexistence. The authors stated that the sensory-like template likely existed in a latent state and was reactivated by visual pings, proposing that sensory and non-sensory templates coexist. However, it remains unclear whether this reflects a dynamic switch between formats or true coexistence. If the templates are non-sensory in nature, what exactly do they represent? Are they meant to be abstract or conceptual representations, or, put simply, just "top-down attentional information"? If so, why did the generalization analysestraining classifiers on activity during the stimulus selection period and testing on preparatory activity-fail to yield significant results? While the stimulus selection period necessarily encodes both target and distractor information, it should still contain attentional information. I would appreciate more discussion from this perspective.

      We thank the reviewer for the helpful clarification of previous comments. Since we addressed similar comments from Reviewer 2 (Point 2) in the previous round, our response below may appear somewhat repetitive. First, regarding whether our findings reflect a dynamic switch between non-sensory and sensory-like template, or the ‘coexistence’ of two template formats, we acknowledge that the temporal limitations of fMRI prevent us from directly testing dynamic representations. However, several aspects of our data favor the latter interpretation: (1) our key findings remained consistent in the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. This makes it unlikely that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the taskirrelevant, uninformative visual impulse; (2) while we agree that the temporal dynamics between the two templates remain unclear, it is difficult to imagine that orientation-specific templates observed in the Ping session emerged de novo from purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? A more parsimonious explanation is that orientation information was already present in a latent format and was activated by the ping, in line with the models of “activity-silent” working memory. However, since the detailed circuit-level mechanism underlying such reactivation remain unclear, we acknowledge that this interpretation warrants direct investigation in future studies. This point is discussed in the main texts (Page 19-20, Line 389-402). 

      Second, while our data cannot definitively determine the nature of the non-sensory template, we consider categorical coding a plausible candidate based on prior visual search studies. For instance, categorical attributes (e.g., left-tilted vs. right-tilted) have been shown to effectively guide attention in orientation search tasks (Wolfe et al., 1992), similar to our paradigm. Further, categorical templates are more tolerant of stimulus variability, making them well-suited to our task, which involved trial-by-trial variations in target orientation around a reference (see Page 21, Line 427- 437 for more detailed discussions).

      Third, the lack of generalization from stimulus selection to preparatory attention in the Ping session may relate to the limited overlap in shared information between these two periods. Neural activity during stimulus selection encodes sensory information about both orientations, along with sensory-like attentional signals (as indicated by the attention decoding and crosstask generalization from perception task to the stimulus-selection period). In contrast, preparatory activity likely involves a dominant non-sensory template, a latent sensory-like template, and residual sensory effects from the impulse stimulus. The limited overlap in sensory-like attentional signals may therefore be insufficient to support generalization across the two periods.

      Reviewer #2 ( Recommendations for the authors)

      I think the central prediction of greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session (the same also holds for 'attend rightward' and 'perceived rightward' could be directly examined by a two-way ANOVA (session × the attend orientation is the same/different from the perceived orientation) for each ROI (V1 and EVC). A three-way ANOVA might complicate readers' intuitive understanding of the implications of the statistical results.

      We thank the reviewer for the suggestion. Following the reviewer’s suggestion, we defined a new condition label based on orientation consistency between attended and perceived orientations: (1) same orientation: averaging “attend leftward/perceive leftward” and “attend rightward/perceive rightward”; and (2) different orientation: averaging “attend leftward/perceive rightward” and “attend rightward/perceive leftward”. A two-way mixed ANOVA (session × orientation consistency) on Mahalanobis distance revealed a main effect of orientation consistency in V1 (F(1,38) = 4.21, p = 0.047, η<sub>p</sub><sup>2</sup> = 0.100), indicating that activity patterns were more similar when attended and perceived orientations matched. No significant main effect of session was found (p = 0.923). Importantly, a significant interaction was found in V1 (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116), suggesting that visual impulse enhanced the similarity between preparatory attentional template and the perception of corresponding orientation. In EVC, the same analysis revealed only a main effect of orientation consistency (F(1,38) = 5.87, p = 0.020, η<sub>p</sub><sup>2</sup> = 0.134), with no significant other effects (ps > 0.240). The interaction results were consistent with those reported in the original three-way ANOVA. We have now replaced the previous analysis with the new one in the main texts (Page 11-12, Line 231-242).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations for the authors):

      Line 364-370: This paragraph is not very clear to me.

      Thank you for pointing this out, we agree our point could have been made clearer. We have clarified as follows:

      “The geographic positions of species’ ranges determine the local pressures and environmental factors to which they are exposed (MacLean and Beissinger, 2017; Pacifici et al., 2020), potentially masking or confounding the effects of traits that evolved under conditions determined by range geography (Schuetz et al., 2019). This process could cause trait-related trends to differ across levels of biological organization (Srivastava et al., 2021), from local populations (where traits might be critical) to biogeographical extents (where traits might be unrelated to range or phenological shifts; Grewe et al., 2013; Gutiérrez and Wilson, 2021; Sunday et al., 2015; Zografou et al., 2021).” (Lines 370-377).

      Reviewer #3 (Recommendations for the authors):

      L313: '...higher population growth' compared to what? Does this mean that species shifting to earlier emergence really show higher population growth over time?

      Thank you for this suggestion, we have clarified as follows: “Earlier seasonal timing allows species to stay within their climatic limits and maintain population growth rates (Macgregor et al., 2019), although earlier emergence could expose individuals to early season weather extremes (McCauley et al., 2018).” (Lines 316-319).

      L336: Same here. Please refer to your comparative counterpart in such statements. Does 'plasticity may enable higher population growth' mean higher than for species shifting range or phenology or higher compared to the previous level for a given species. In many cases it seems you are referring to an overall baseline, so that the 'higher' means 'lesser decline'. Wouldn't plasticity maintain population growth at similar levels as before? The current wording suggests that plasticity results in species exceeding their previous population growth. Please rephrase.

      We agree it was confusing with no comparative counterpart, so we changed the sentence as follows: “Adaptive evolution and plasticity may enable high population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Lines 341-343).

      L307: The term 'universal winners' appears too strong and not well justified given the lack of the crucial third dimension of response. In fact, changes in phenology are less indicative than abundance trends. Combined with range shifts they would tell a story of success or failing, while phenological shifts would rather help to understand how species adapted. I am not saying the insight cannot stand alone, but it is important to adapt the wording in this regard.

      Thank you for this comment, we have clarified the text as follows: “These results suggest that some species may have an advantage with respect to climate change: they demonstrate the flexibility to respond both temporally and spatially to the onset of rapid climate change.” (Lines 310-313).

      We also softened language around winners and losers on line 388: “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there may be ‘winners’ and ‘losers’ under climate change (Figure 4).” (Lines 387-388).

      L326-240: I agree with line 330 that abundance trends are needed to clarify the situation of species shifting or not shifting ranges and phenology. However, this abstract should clarify that this is particularly important to understand whether non shifting species are really the 'losers'. If these species show adapted evolution or plasticity, we would expect they do not decline in abundance. Even without shifts in range or phenology they would be the 'ultimate winners' as you call it.

      Thank you for this comment, we agree that abundance trends are necessary to understand potential winners and losers. We have made this addition to the abstract as follows: “Species shifting in both space and time may be more resilient to extreme conditions, although further work integrating abundance data is needed.” (Lines 16-18).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This paper sets out to achieve a deeper understanding of the effects of hydrogen sulfide on C. elegans behavior and physiology, with a focus on behavior, detection mechanism(s), physiological responses, and detoxification mechanisms.

      Strengths: 

      The paper takes full advantage of the experimental tractability of C. elegans, with thorough, welldesigned genetic analyses. 

      Some evidence suggests that H<SUB>2</SUB>S may be directly detected by the ASJ sensory neurons.  The paper provides interesting and convincing evidence for complex interactions between responses to different gaseous stimuli, particularly an antagonistic role between H<SUB>2</SUB>S and O2 detection/response.  Intriguing roles for mitochondria and iron homeostasis are identified, opening the door to future studies to better understand the roles of these components and processes. 

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      The claim that worms' behavioral responses to H<SUB>2</SUB>S are mediated by direct detection is incompletely supported. While a role for the chemosensory neuron ASJ is implicated, it remains unclear whether this reflects direct detection. Other possibilities, including indirect effects of ASJ and the guanylyl cyclase daf-11 on O2 responses, are also consistent with the authors' data. 

      We thank the reviewer for the insightful comment and agree that the role of ASJ neurons in H<SUB>2</SUB>S detection was not clear. We included new experiments and revised our text to make it clearer.

      Since our initial analyses suggest a role of ASJ neurons in H<SUB>2</SUB>S-evoked locomotory responses (Figure 2F and G), We thought that this would offer us a starting point to dissect the neuronal circuit involved in H<SUB>2</SUB>S responses. Expression of the tetanus toxin catalytic domain in ASJ, which blocks neurosecretion, inhibited H<SUB>2</SUB>S-evoked locomotory speed responses (Figure 2H), suggesting that neurosecretion from ASJ promotes H<SUB>2</SUB>S-evocked response (Lines 162–165). We then performed calcium imaging of ASJ neurons in response to H<SUB>2</SUB>S exposure. However, while we observed CO<SUB>2</SUB>-evoked calcium transients in ASJ using GCaMP6s, we did not detect any calcium response to H<SUB>2</SUB>S, under several conditions, including animals on food, off food, and with different H<SUB>2</SUB>S concentrations and exposure times (Figure2—Figure supplement 2E and F) (Lines 166–170). Since signaling from ASJ neurons regulates developmental programs that modify sensory functions in C. elegans (Murakami et al., 2001), the involvement of ASJ neurons is not specific to H<SUB>2</SUB>S and ASJ neurons are unlikely to serve as the primary H<SUB>2</SUB>S sensor (Discussed in Line 449–458). Therefore, the exact sensory neuron, circuit and molecular triggers mediating acute H<SUB>2</SUB>S avoidance remain to be elucidated.

      Our subsequent investigation on mitochondrial components suggests that a burst of mitochondrial ROS production may be the trigger for H<SUB>2</SUB>S avoidance, as transient exposure to rotenone substantially increases baseline locomotory speed (Figure 7E) (Line 391–396). However, to initiate avoidance behavior to H<SUB>2</SUB>S, mitochondrial ROS could potentially target multiple neurons and cellular machineries, making it challenging to pinpoint specific sites of action. Nevertheless, we agree that further dissection of the neural circuits and mitochondrial signaling in H<SUB>2</SUB>S avoidance will be important and should be explored in future studies.

      The role of H<SUB>2</SUB>S-mediated damage in behavioral responses, particularly when detoxification pathways are disrupted, remains unclear. 

      We thank the reviewer for the insightful comment and fully agree with the concern raised. The same issue was also noted by the other reviewers. We agree that decreased locomotory responses in H<SUB>2</SUB>S-sensitized animals can arise from distinct causes, either systemic toxicity or behavioral adaptation, and distinguishing between these is critical. We have included new experiments and revised the text to clarify this issue.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial omega-turn and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxificationdefective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485-536. 

      The findings of the paper are somewhat disjointed, such that a clear picture of the relationships between H<SUB>2</SUB>S detection, detoxification mechanisms, mitochondria, and iron does not emerge from these studies. Most importantly, the relative roles of H<SUB>2</SUB>S detection and integration, vs. general and acute mitochondrial crisis, in generating behavioral responses are not convincingly resolved.  

      We thank the reviewer for this comment and agree that our presentation did not fully connect different findings into a cohesive picture. To address this, we have acquired new data, and revised the abstract, results and discussion sections to clarify two phases of H<SUB>2</SUB>S-evoked responses: an initial avoidance behavior upon H<SUB>2</SUB>S exposure, followed by a later phase of adaption and detoxification when the escape is not successful.

      In brief, we began with the basic characterization of H<SUB>2</SUB>S-induced locomotory speed response, followed by a candidate gene screen to identify key molecules and pathways involved in initial speed response to H<SUB>2</SUB>S. Subsequently, we focused on three major intersecting pathways that contributed to the acute behavioral response to H<SUB>2</SUB>S. These include cGMP signaling, which led to the identification of ASJ neurons; nutrient-sensitive pathways that modulate behavioral responses to both H<SUB>2</SUB>S and CO2; and O2sensing signaling, whose activation inhibits responses to H<SUB>2</SUB>S. However, the molecules and neurons in these pathways, including ASJ, likely play modulatory roles and are unlikely to serve as the primary H<SUB>2</SUB>S sensors. Our subsequent analysis, however, suggests that mitochondria play a critical role in triggering avoidance behavior upon H<SUB>2</SUB>S exposure. Brief treatment with rotenone, a potent inducer of ROS, led to marked increase in locomotory speed (Figure 7E). This suggests the possibility that a burst of ROS production triggered toxic levels of H<SUB>2</SUB>S (Jia et al., 2020) may initiate the avoidance behavior.

      When the initial avoidance fails, H<SUB>2</SUB>S detoxification programs are induced as a long-term survival strategy. The induction of detoxification programs appears to enhance tolerance to H<SUB>2</SUB>S exposure and contributes to the gradual decrease of locomotory speed in H<SUB>2</SUB>S. We now provide a clearer image of how different pathways modulate H<SUB>2</SUB>S detoxification and adaptation (see our responses to other comments). Briefly, mutants defective in detoxification, such as hif-1 and other detoxification-defective mutants, showed stronger initial omega-turn response and a rapid loss of locomotion. This loss of locomotion is likely caused by early cellular toxicity as the mutants failed to respond to other unrelated stimuli (nearUV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I). Likewise, smf-3 mutants and BP-treated animals were hypersensitive to H<SUB>2</SUB>S (Figure 6D and E, and Figure 6—Figure supplement 1G and I), likely due to impaired H<SUB>2</SUB>S detoxification under low iron conditions, as iron is a co-factor required for the activity of the H<SUB>2</SUB>S detoxification enzyme ETHE-1 (Figure 5K and Figure 5—Figure supplement 1E).

      In contrast, reduced locomotion and response in other contexts such as egl-9 mutants and SODdeficient animals reflect H<SUB>2</SUB>S-induced adaptive mechanism rather than toxicity as they remain responsive to the other stimuli after H<SUB>2</SUB>S exposure. Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Therefore, different animals decline their locomotory speed to the effects of H<SUB>2</SUB>S through distinct mechanisms. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #2 (Public Review): 

      Summary: 

      H<SUB>2</SUB>S is a gas that is toxic to many animals and causes avoidance in animals such as C. elegans. The authors show that H<SUB>2</SUB>S increases the frequency of turning and the speed of locomotion. The response was shown to be modulated by a number of neurons and signaling pathways as well as by ambient oxygen concentrations. The long-term adaptation involved gene expression changes that may be related to iron homeostasis as well as the homeostasis of mitochondria. 

      Strengths: 

      Overall, the authors provide many pieces that will be important for solving how H<SUB>2</SUB>S signals through neuronal circuits to change gene expression and physiological programs. The experiments rely mostly on a behavioral assay that measures the increase of locomotion speed upon exposure to H<SUB>2</SUB>S. This assay is then combined with manipulations of environmental factors, different wild-type strains, and mutants. The mutants analyzed were obtained as candidates from the literature and from transcriptional profiling that the authors carried out in worms that were exposed to H<SUB>2</SUB>S. These studies imply several genetic signaling pathways, some neurons, and metabolism-related factors in the response to H<SUB>2</SUB>S. Hence the data provided should be useful for the field.  

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      On the other hand, many important aspects of the underlying mechanisms remain unsolved and the reader is left with many loose ends. For example, it is not clear how H<SUB>2</SUB>S is actually sensed, how sensory neurons are activated and signal to downstream circuits, and what the role of ciliated and RMG neurons is in this circuit. It remains unclear how signals lead to gene expression and physiological changes such as metabolic rewiring. Solving all this would clearly be beyond the scope of a single manuscript. Yet, the manuscript also does not focus on understanding one of these central aspects and rather is all over the place, which makes it harder to understand for readouts that are not in this core field. Multiple additional methods and approaches exist to dig deeper into these mechanisms in the future, such as neuronal calcium imaging, optogenetics, and metabolic analysis. To generate a story that will be interesting to a broad readership substantial additional experimentation would be required. Further, in the current manuscript, it is often difficult to understand the rationales of the experiments, why they were carried out, and how to place them into a context. This could be improved in terms of documentation, narration/explanation, and visualization.  

      We thank the reviewer for the comment, which has also been raised by the other reviewers. We agree that our initial submission was poorly presented. We also acknowledge the fact that some aspects, such as detailed neural circuit and sensory transduction, still remain unresolved. We have now included additional experiments and revised the manuscript to clarify the logic of our experiments, provided better context for our findings, and improved both the narrative flow and data visualization to make the manuscript more accessible to readers. We now provide a clearer image of how different pathways interact to modulate the initial avoidance response, and the H<SUB>2</SUB>S detoxification and behavioral habituation during prolonged H<SUB>2</SUB>S exposure. The following response is similar to the one for reviewer #1.

      In brief, we began with the basic characterization of H<SUB>2</SUB>S-induced locomotory speed response, followed by a candidate gene screen to identify key molecules and pathways involved in initial speed response to H<SUB>2</SUB>S. Subsequently, we focused on three major intersecting pathways that contributed to the acute behavioral response to H<SUB>2</SUB>S. These include cGMP signaling, which led to the identification of ASJ neurons; nutrient-sensitive pathways that modulate behavioral responses to both H<SUB>2</SUB>S and CO2; and O2sensing signaling, whose activation inhibits responses to H<SUB>2</SUB>S. However, the molecules and neurons in these pathways, including ASJ, likely play modulatory roles and are unlikely to serve as the primary H<SUB>2</SUB>S sensors. Our subsequent analysis, however, suggests that mitochondria play a critical role in triggering avoidance behavior upon H<SUB>2</SUB>S exposure. Brief treatment with rotenone, a potent inducer of ROS, led to marked increase in locomotory speed (Figure 7E). This suggests the possibility that a burst of ROS production triggered toxic levels of H<SUB>2</SUB>S (Jia et al., 2020) may initiate the avoidance behavior.

      When the initial avoidance fails, H<SUB>2</SUB>S detoxification programs are induced as a long-term survival strategy. The induction of detoxification programs appears to enhance tolerance to H<SUB>2</SUB>S exposure and contributes to the gradual decrease of locomotory speed in H<SUB>2</SUB>S. We now provide a clearer image of how different pathways modulate H<SUB>2</SUB>S detoxification and adaptation (see our responses to other comments). Briefly, mutants defective in detoxification, such as hif-1 and other detoxification-defective mutants, showed stronger initial omega-turn response and a rapid loss of locomotion. This loss of locomotion is likely caused by early cellular toxicity as the mutants failed to respond to other unrelated stimuli (nearUV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I). Likewise, smf-3 mutants and BP-treated animals were hypersensitive to H<SUB>2</SUB>S (Figure 6D and E, and Figure 6—Figure supplement 1G and I), likely due to impaired H<SUB>2</SUB>S detoxification under low iron conditions, as iron is a co-factor required for the activity of the H<SUB>2</SUB>S detoxification enzyme ETHE-1 (Figure 5K and Figure 5—Figure supplement 1E).

      In contrast, reduced locomotion and response in other contexts such as egl-9 mutants and SODdeficient animals reflect H<SUB>2</SUB>S-induced adaptive mechanism rather than toxicity as they remain responsive to the other stimuli after H<SUB>2</SUB>S exposure. Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Therefore, different animals decline their locomotory speed to the effects of H<SUB>2</SUB>S through distinct mechanisms. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript explores the behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H<SUB>2</SUB>S motivates the study of responses in C. elegans. The manuscript is well-written in communicating the complex physiology around C. elegans behavioral responses to H<SUB>2</SUB>S and in appropriately citing prior and related relevant work. 

      There are three parts to the manuscript.

      In the first, an immediate behavioral response-increased locomotory rate-upon exposure to H<SUB>2</SUB>S is characterized. The experimental conditions are critical, and data are obtained from exposure of animals to 150ppm H<SUB>2</SUB>S at 7% O2. The authors provide evidence that this is a chemosensory response to H<SUB>2</SUB>S, showing a requirement for genes encoding components of the cilia apparatus and implicating a role for tax-4 and daf-11. Neuron-specific rescue in the ASJ neurons suggests the ASJ neurons contribute to the response to H<SUB>2</SUB>S. One caveat is that previous work has shown that the dauer-constitutive phenotype of daf-11 mutants can be suppressed by ASJ ablation, suggesting that there may be pervasive changes in animal nervous system signaling that are ASJ-dependent in daf-11 mutants, which may indirectly alter chemosensory responses to H<SUB>2</SUB>S. More direct methods to assess whether ASJ senses H<SUB>2</SUB>S, e.g. using calcium imaging, would better assess a direct role for the ASJ neurons in a behavioral response to H<SUB>2</SUB>S. The authors also point out interesting parallels between the response to H<SUB>2</SUB>S and CO2 though provide some genetic data separating the two responses. Importantly, the authors note that when aerotaxis (O2sensing and movement) in the presence of bacterial food is intact, as in npr-1 215F animals, the response to H<SUB>2</SUB>S is abrogated. Mutation in gcy-35 in the npr-1 215F background restores the H<SUB>2</SUB>S chemosensory response. 

      There is a second part of the paper that conducts transcriptional profiling of the response to H<SUB>2</SUB>S that corroborates and extends prior work in this area. 

      The final part of the paper is the most intriguing, but for me, also the most problematic. The authors examine how H<SUB>2</SUB>S-evoked locomotory behavioral responses are affected in mutants defective in the stress and detoxification response to H<SUB>2</SUB>S, most notably hif-1. Prior genetic studies have established the pathways leading to HIF-1 activation/stabilization, as well as potential downstream mechanisms. The authors conduct logical genetic analysis to complement studies of the hif-1 mutant and in part motivated by their transcriptional profiling studies, examine the role of iron sequestration/free iron in the locomotory response to H<SUB>2</SUB>S, and further speculate on how the behavior of mutants defective in mitochondrial function might be affected by exposure to H<SUB>2</SUB>S. 

      In some regard, this part of the manuscript is interesting because the analysis begins to connect how the behavior of an animal to a toxic compound is affected by mutations that affect sensitivity to the toxic compound. However, what is unclear is what is being studied at this point. In the context, of noting that H<SUB>2</SUB>S at 150ppm is known to be lethal, its addition to mutants clearly sensitized to its effects would be anticipated to have pervasive effects on animal physiology and nervous system function. The authors note that the continued increased locomotion of wild-type animals upon H<SUB>2</SUB>S exposure might be due to the byproducts of detoxification or the detrimental effects of H<SUB>2</SUB>S. The latter explanation seems much more likely, in which case what one may be observing is the effects of general animal sickness, or even a bit more specifically, neuronal dysfunction in the presence of a toxic compound, on locomotion. As such, what is unclear is what conclusions can be taken away from this part of the work.  

      Strengths: 

      (1) Characterization of a motor behavior response to H<SUB>2</SUB>S 

      (2) Transcriptional profiling of the response to H<SUB>2</SUB>S corroborating prior work.  

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      Unclear significance and experimental challenges regarding the study of locomotory responses to animals sensitized to the toxic effects of H<SUB>2</SUB>S under exposure to H<SUB>2</SUB>S. 

      We thank the reviewer for the comment, which has also been raised by the other reviewers. We agree that our initial submission left several important questions open, and we acknowledge the fact that some aspects, such as detailed neural circuit and sensory transduction, still remain unresolved. Nevertheless, we acquired new data and revised our text, aiming to clarify the distinct mechanisms underlying the reduced locomotion in different mutants during prolonged H<SUB>2</SUB>S exposure.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial reorientation and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, constant high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxification-defective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #1 (Recommendations For The Authors): 

      To better substantiate a role for H<SUB>2</SUB>S detection, it would be useful for the authors to image Ca responses to H<SUB>2</SUB>S in ASJ in WT and unc-13, and to rule out the possibility that the requirement for daf-11 in ASJ reflects a role in O2 rather than H<SUB>2</SUB>S detection. 

      We thank the reviewer for this comment. As suggested, we performed calcium imaging of ASJ neurons using GCaMP6s. As previously described, 3% CO<SUB>2</SUB> evoked a calcium transient in ASJ (Figure 2—figure supplement 2F). To investigate whether H<SUB>2</SUB>S evoked a calcium transient in ASJ neurons, we tested several conditions, including animals on food or off food, with different H<SUB>2</SUB>S concentrations (~75 or ~150ppm), and different exposure time (4 or 8 mins). However, we did not detect a calcium response to H<SUB>2</SUB>S in ASJ under any of the conditions tested (Figure2—figure supplement 2E) (Lines 166–168). Given that neuronspecific rescue of daf-11 or tax-4 mutants pointed to a role of ASJ neurons in promoting H<SUB>2</SUB>S responses, we sought to determine how ASJ neurons were involved. Expression of the tetanus toxin catalytic domain in ASJ neurons, which blocks neurosecretion, inhibited H<SUB>2</SUB>S-evoked locomotory speed responses (Figure 2H), similar to the phenotypes observed in daf-11 and daf-7 mutants (Figure 2C and D) (Lines 162–165). These results confirm that ASJ activity and neurosecretion contribute to the H<SUB>2</SUB>S responses, although ASJ is unlikely to serve as the primary H<SUB>2</SUB>S sensor. One potential explanation is that DAF-7 released by ASJ controls the starvation program, which in turn modulates the animal’s response to H<SUB>2</SUB>S. We also discussed this in Lines 449–458.

      The paper would be significantly strengthened by testing the possibility (as the authors acknowledge in lines 348-52) that disruption of detoxification mechanisms reduces sustained behavioral responses to H<SUB>2</SUB>S because of physiological damage. Authors use acute exposure to high O2 for this purpose earlier in the paper, but not to probe the consequences of loss of hif-1 and detoxification factors.  

      We thank the reviewer for the valuable suggestion. As the reviewer highlighted, we attributed the brief locomotory speed responses to H<SUB>2</SUB>S observed in hif-1 mutants to the lack of detoxification response, leading to the rapid intoxication of the animals. Several lines of evidence support this conclusion. First, we observed that hif-1 and the detoxification mutants displayed a stronger initial reorientation response (omega turns) and a more rapid decline in speed and reversals compared to wild type (Figure 5 D–F). Second, to test if hif-1 mutants were indeed more susceptible to H<SUB>2</SUB>S toxicity, we exposed WT and hif-1 animals to H<SUB>2</SUB>S for 30 mins and subsequently tested their ability to respond to near-UV light. Unlike WT animals, the speed response to near-UV light was inhibited in hif-1 mutants (Figure 5I), suggesting that exposure to H<SUB>2</SUB>S for 30 min causes a stronger toxicity in animals deficient of HIF-1 signaling. Third, hif-1 and detoxification mutants displayed a sustained high speed in response to 1% O<SUB>2</SUB> , suggesting the specific impairment of H<SUB>2</SUB>S response. The data were presented in Lines 318–347, and were further discussed this in Lines 485–508.

      To better understand whether mitochondrial damage has a role in H<SUB>2</SUB>S-evoked behavior, it might be useful for the authors to determine whether general ROS response pathways are important for H<SUB>2</SUB>S behavioral responses.

      We thank the reviewer for this insightful comment. As suggested, we investigated whether ROS detoxification pathways contribute to H<SUB>2</SUB>S-evoked locomotory speed responses by analyzing mutants in the superoxide dismutase (SOD) family. These experiments, together with other observations, suggest that mitochondrial ROS play a dual role in H<SUB>2</SUB>S-evoked locomotion. The relevant results were presented in Lines 401–425, and were further discussed in Lines 509–536.

      First, we found that increased mitochondrial ROS formation, either induced pharmacologically by rotenone or genetically in mitochondrial electron transport chain (ETC) mutants (Ishii et al., 2013; Ochi et al., 2016; Ramsay & Singer, 1992; Yang & Hekimi, 2010; Zorov, Juhaszova, & Sollott, 2014), suppressed the behavioral response to toxic H<SUB>2</SUB>S (Figure 7A–E). This indicates that mitochondrial ROS plays a significant role in H<SUB>2</SUB>S-evoked responses. One likely explanation is that high ROS formation may dampen the H<SUB>2</SUB>S-triggered ROS spike, or may impair other H<SUB>2</SUB>S signaling processes required to initiate avoidance. Second, consistent with previous reports (Onukwufor et al., 2022), we observed that shortterm rotenone exposure (<1 hour) significantly increased baseline locomotory speed. Given that toxic H<SUB>2</SUB>S levels promote ROS formation (Jia et al., 2020), our findings suggest that acute mitochondrial ROS production by toxic levels of H<SUB>2</SUB>S exposure may serve as a trigger for the avoidance response.

      In contrast, animals with sustained mitochondrial ROS production do not have an increased baseline locomotory speed. This effect was observed after 2 hours of rotenone exposure, in mitochondrial ETC mutants, and in animals lacking all SOD enzymes (Figures 7A–K). A likely explanation for the reduced basal locomotory speed during sustained mitochondrial ROS production is the activation of ROSresponsive signaling pathways including HIF-1, NRF2/SKN-1, and DAF-16/FOXO (Lennicke & Cocheme, 2021; Patten, Germain, Kelly, & Slack, 2010), which may promote adaptation to prolonged oxidative stress (Figure 7H). Notably, unlike hif-1 mutants, SOD-deficient animals remained as responsive as WT to other stimuli after 30 minutes of H<SUB>2</SUB>S exposure (Figure 7—figure supplement 1H), indicating that elevated ROS levels do not compromise overall viability or the ability to detoxify H<SUB>2</SUB>S.

      Taken together, these results support a model in which mitochondrial ROS exerts a biphasic effect on H<SUB>2</SUB>S-induced avoidance. It enhances detection and avoidance under acute stress but contributes to locomotory suppression when ROS levels remain elevated chronically.

      Reviewer #2 (Recommendations For The Authors):

      The way the manuscript is presented could be improved without much effort by rewriting/editing. For the reader, it is hard at present to understand the rationales of the experiments, why they were carried out, and how to place them into a context. This could be improved on three levels:

      (1) Documentation 

      (2) Narration/Explanation 

      (3) Visualization 

      (1) Documentation

      Not all of the results in the text are well documented. The results should be described with more details in the written text and improved documentation and quantification of the results. Example: 

      Turning behavior is mentioned as an important aspect of the response to H<SUB>2</SUB>S. There is no citation given but this effect is not well documented. The authors image the animals and could provide video footage of the effect, could quantify eg turning/pirouettes, and provide the data. At the moment the manuscript largely relies on measuring the increase in speed, but the reader is left wondering what other behavioral effects occur and how this is altered in all of the mutant and other conditions tested. Just quantifying speed reduces the readout and seems like an oversimplification to characterize the behavioral response.  

      We are grateful for this comment. We now provide a video footage of the H<SUB>2</SUB>S effects (Figure 1—Video 1). As suggested, we analyzed the recordings to extract reorientation (omega-turns) and reversals. These analyses are now included in the Supplemental file 1 with representative panels displayed in Figure 5 and supplements to Figures 2, 3, 5, 6 and 7. Even though the mutant effects on omega-turns were often subtle, and reversal responses showed considerable variability (likely due to differences in population density, food availability, or animals’ physiological state prior to the assay), this analysis has proven valuable for distinguishing mutants that exhibit adaptation from those that display hypersensitivity to H<SUB>2</SUB>S toxicity. For instance, although both SOD-deficient and BP-treated animals failed to increase their locomotory speed in H<SUB>2</SUB>S (Figure 6E and Figure 7G), they exhibited distinct omega-turn responses (Figure 6—figure supplement 1I and Figure 7—figure supplement 1F), suggesting that different mechanisms likely underlie the locomotory defects of these two animals. We have integrated the omega-turn and reversal data into the text and discussed under relevant contexts.

      (2) Narration/Description.

      Generally, the description of the results part is very brief and it is often not clear why a certain experiment was carried out and how. Surely it is possible to check the methods but this interrupts the flow of reading and it would be easier for the reader to be guided through the results with more information what the initial motivation for an experiment is, what the general experimental outline is, and what specific experiments are carried out. 

      We apologize for the lack of clarity and logical structure in the initial submission. In the revised manuscript, we have thoroughly revised the text to improve its organization and readability.

      Examples: 

      Line 97ff: The authors performed a candidate screen yet it is not described why which genes were chosen. Are there also pathways that were tested that turned out to not be involved? 

      We thank the reviewer for the suggestion. To address this, we have added a new section, explaining the rationale for selecting genes and pathways in our candidate screen. Briefly, we focused on genes known or predicted to be involved in sensory responses to gaseous stimuli in C. elegans and mammals, including globins and guanylate cyclases (21% O<SUB>2</SUB> sensing), potassium channels (acute hypoxia), and nutrientsensitive pathways (CO<SUB>2</SUB> responses). We also included mutants defective in sensory signal transduction and neurotransmission. In addition, mitochondrial mutants were analyzed because mitochondria play a central role in H<SUB>2</SUB>S detoxification. The pathways that contributed to the acute H<SUB>2</SUB>S response included cGMP, insulin, and TGF-β signaling, as well as mitochondrial components. In contrast, globins, potassium channels, and biogenic amine signaling did not appear to play significant roles under our assay conditions. The results of the candidate screen are described in Lines 106–138 and summarized in Supplementary File 1.

      line 262ff: the paragraph starts with explaining ferritin genes that are important for iron control but the reader does not yet know why. Then it is explained that a ferritin gene is DE in the H<SUB>2</SUB>S transcriptomes. then a motivation to look into the labile iron pool is described. Why not first explain what genes are strongly regulated and why they are selected based on their DE? Then explain what is known about these genes and pathways, and then motivate a set of experiments. 

      We agree with the reviewer that our initial description could have been more logically organized. We reframed this section to first present the RNA-seq data, followed by an explanation of their known biological functions and the motivation for the subsequent experiments (Lines 350–357).

      nhr-49 appears suddenly in the results part and it is not clear why it was tested and how the result links. Is nhr-49 a key transcription factor that is activated by H<SUB>2</SUB>S sensory or physiological response, and does it control the signaling or protective changes induced by H<SUB>2</SUB>S?  

      We thank the reviewer for the comment. As suggested, we revised the text to present the information more clearly. In our candidate gene screen, a set of mutants exhibiting reduced speed responses to H<SUB>2</SUB>S has previously been shown to be defective in response to CO<SUB>2</SUB> stimulation (Hallem & Sternberg, 2008). These included animals deficient in nutrient-sensitive pathways, including insulin, TGF-beta, and NHR49, which were reported by Sternberg’s lab to exhibit dampened responses to CO<SUB>2</SUB> (Hallem & Sternberg, 2008) (Lines 173–179). We also included a simply cartoon to further illustrate this (Figure 3C).

      The nuclear hormone receptor NHR-49 has been implicated in a variety of stress responses, including starvation (Van Gilst, Hadjivassiliou, & Yamamoto, 2005), bacterial pathogen (Naim et al., 2021; Wani et al., 2021), and hypoxia (Doering et al., 2022). The nhr-49 mutants exhibited a rapid decline in locomotory speed during H<SUB>2</SUB>S exposure, implicating a role in sustaining high speed in the presence of H<SUB>2</SUB>S. Furthermore, we observed that fmo-2, a well-characterized target gene of NHR-49, was significantly upregulated after 1 hour of exposure to 50 and 150 ppm H<SUB>2</SUB>S (Supplementary file 2), suggesting that NHR-49 signaling is rapidly activated by H<SUB>2</SUB>S exposure. Exactly how NHR-49 contributes to H<SUB>2</SUB>S response requires further investigation.

      (3) Visualization 

      Adding a model/cartoon summary that describes the pathways tested and their interaction would be helpful in some of the figures for the reader to keep an overview of the pathways that were tested. Also, a final summary cartoon that integrates all the puzzle pieces into one larger picture would be helpful. Such a final cartoon overview could also point to the key open questions of the underlying mechanisms. 

      We thank the reviewer for this suggestion. We have added a series of models/cartoons to illustrate the different pathways and their interactions. These include starvation regulatory mechanisms (Figure 3C), 21% O<SUB>2</SUB> sensing mechanisms (Figure 3G), HIF-1 signaling and detoxification (Figure 5—figure supplement 1E), HIF-1 signaling and the regulation of labile iron (Figure 6H), as well as ROS signaling and regulation (Figure 7L). To help interpretation and to elaborate on these models, we have also included explanatory sentences in the corresponding figure legends.

      Other comments: 

      Introduction and line 93: The authors mention that 50 ppm H<SUB>2</SUB>S has beneficial effects on lifespan yet does not have a detectable phenotype." Are there any concentrations of H<SUB>2</SUB>S that cause attraction of C. elegans and what is the preferred range if it exists? Could this be measured in an H<SUB>2</SUB>S gradient? 

      We thank the reviewer for the insightful comment. We performed an H<SUB>2</SUB>S gradient assay, which suggests that wild type animals are attracted toward low concentrations of H<SUB>2</SUB>S around 40 ppm (Figure 1G and H) (Lines 95–104). These results suggest that H<SUB>2</SUB>S acts as a strong repellent for C. elegans at high concentrations but as an attractant at low levels. This dual role may be ecologically relevant, as wild C. elegans lives in complex and dynamic environments where H<SUB>2</SUB>S levels likely fluctuate over short distances (Adams, Farwell, Pack, & Bamesberger, 1979; Budde & Roth, 2011; Morra & Dick, 1991; Patange, Breen, Arsuffi, & Ruvkun, 2025; Rodriguez-Kabana, Jordan, & Hollis, 1965; Romanelli-Cedrez, Vairoletti, & Salinas, 2024).

      Line 146: "Local H<SUB>2</SUB>S concentrations could also be significantly higher in decomposing substances where wild C. elegans thrives" please provide a citation.

      As suggested, we included a set of references that have described the H<SUB>2</SUB>S enrichment in the natural environment in early field studies (Adams et al., 1979; Morra & Dick, 1991; Rodriguez-Kabana et al., 1965), as well as references that have discussed and implied this in C. elegans studies (Budde & Roth, 2011; Patange et al., 2025; Romanelli-Cedrez et al., 2024). They can be found in the introduction (Lines 59–62) and in the result (Lines 197–199).

      Line 311 "Wild C. elegans isolates thrive in the decomposing matters, where the local concentrations of O2 are low while the levels of CO2 and H<SUB>2</SUB>S could be high. These animals have adapted their behavior in such an environment, displaying increased sensitivity to high O2 exposure but dampened responses to CO2." Please provide citations for these statements.  

      As suggested, we cited the relevant articles or books describing the variation of O<SUB>2</SUB> and CO<SUB>2</SUB> levels in the decomposing matters including several C. elegans papers that mentioned this in Lines 197–199 (Bretscher, Busch, & de Bono, 2008; Gea, Barrena, Artola, & Sanchez, 2004; Hallem & Sternberg, 2008; Oshins, Michel, Louis, Richard, & Rynk, 2022), and the above-mentioned articles for H<SUB>2</SUB>S (Adams et al., 1979; Budde & Roth, 2011; Morra & Dick, 1991; Patange et al., 2025; Rodriguez-Kabana et al., 1965; Romanelli-Cedrez et al., 2024).

      For C. elegans’ sensitivity to O2 and CO2, these articles were cited in Lines 201–203 (Beets et al., 2020; Bretscher et al., 2008; Carrillo, Guillermin, Rengarajan, Okubo, & Hallem, 2013; Hallem & Sternberg, 2008; Kodama-Namba et al., 2013; McGrath et al., 2009).

      Reviewer #3 (Recommendations For The Authors): 

      More work could be conducted establishing the neuronal circuitry involved in the initial, tractable response to H<SUB>2</SUB>S. 

      We thank the reviewer for the insightful comment. Since our initial analyses suggest a role of ASJ neurons in H<SUB>2</SUB>S-evoked locomotory responses (Figure 2F and G), We thought that this would offer us an entry point to dissect the neuronal circuit involved in H<SUB>2</SUB>S responses. Expression of the tetanus toxin catalytic domain in ASJ, which blocks neurosecretion, inhibited H<SUB>2</SUB>S evoked locomotory responses (Figure 2H), suggesting that neurosecretion from ASJ promotes the speed response to H<SUB>2</SUB>S (Lines 162– 165). We then performed calcium imaging of ASJ neurons in response to H<SUB>2</SUB>S exposure. However, while we observed CO<SUB>2</SUB> -evoked calcium transients in ASJ using GCaMP6s, we did not detect any calcium response to H<SUB>2</SUB>S, under several conditions, including animals on food, off food, and with different H<SUB>2</SUB>S concentrations and exposure times (Figure2—Figure supplement 2E and 2F) (Lines 166–168). Since signaling from ASJ neurons regulates developmental programs that modify sensory functions in C. elegans, including CO<SUB>2</SUB> and O<SUB>2</SUB> responses (Murakami, Koga, & Ohshima, 2001), the involvement of ASJ neurons is not specific to H<SUB>2</SUB>S responses and ASJ neurons are unlikely to serve as a primary H<SUB>2</SUB>S sensor (Discussed in Line 449–458). Therefore, the exact sensory neuron, circuit and molecular triggers mediating acute H<SUB>2</SUB>S avoidance behavior remain to be elucidated.

      Our subsequent investigation on mitochondrial components suggests that a burst of mitochondrial ROS production may be the trigger for H<SUB>2</SUB>S avoidance, as transient exposure to rotenone substantially increases baseline locomotory activity (Figure 7E) (Line 391–396). However, mitochondrial ROS could potentially target multiple neurons and cellular machineries to initiate avoidance behavior to H<SUB>2</SUB>S, making it challenging to pinpoint specific sites of action. Nevertheless, we agree that further dissection of the neural circuits and mitochondrial signaling in H<SUB>2</SUB>S avoidance will be important and should be explored in future studies. We discussed this in Lines 509–536. 

      I am not sure how to overcome the challenges involved in reaching conclusions from the decreased locomotory responses of animals that are sensitized to the effects of H<SUB>2</SUB>S. Perhaps this conundrum could be discussed in more detail in the text. 

      We thank the reviewer for this important comment. We agree that decreased locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct causes, either systemic toxicity or adaptation, and distinguishing between these is critical. We have included new experiments and revised the text to clarify this issue.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial reorientation and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxificationdefective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485–536. 

      References

      Adams, D. F., Farwell, S. O., Pack, M. R., & Bamesberger, W. L. (1979). Preliminary Measurements of Biogenic Sulfur-Containing Gas Emissions from Soils. Journal of the Air Pollution Control Association, 29(4), 380-383. doi:Doi 10.1080/00022470.1979.10470805

      Beets, I., Zhang, G., Fenk, L. A., Chen, C., Nelson, G. M., Felix, M. A., & de Bono, M. (2020). NaturaL Variation in a Dendritic Scaffold Protein Remodels Experience-Dependent Plasticity by Altering Neuropeptide Expression. Neuron, 105(1), 106-121 e110. doi:10.1016/j.neuron.2019.10.001  

      Bretscher, A. J., Busch, K. E., & de Bono, M. (2008). A carbon dioxide avoidance behavior is integrated with responses to ambient oxygen and food in Caenorhabditis elegans. Proc Natl Acad Sci U S A, 105(23), 8044-8049. doi:10.1073/pnas.0707607105

      Budde, M. W., & Roth, M. B. (2011). The response of Caenorhabditis elegans to hydrogen sulfide and hydrogen cyanide. Genetics, 189(2), 521-532. doi:10.1534/genetics.111.129841

      Carrillo, M. A., Guillermin, M. L., Rengarajan, S., Okubo, R. P., & Hallem, E. A. (2013). O-2-Sensing Neurons Control CO2 Response in C. elegans. Journal of Neuroscience, 33(23), 9675-9683. doi:10.1523/Jneurosci.4541-12.2013  

      Doering, K. R. S., Cheng, X., Milburn, L., Ratnappan, R., Ghazi, A., Miller, D. L., & Taubert, S. (2022). Nuclear hormone receptor NHR-49 acts in parallel with HIF-1 to promote hypoxia adaptation in Caenorhabditis elegans. Elife, 11. doi:10.7554/eLife.67911

      Gea, T., Barrena, R., Artola, A., & Sanchez, A. (2004). Monitoring the biological activity of the composting process: Oxygen uptake rate (OUR), respirometric index (RI), and respiratory quotient (RQ). Biotechnol Bioeng, 88(4), 520-527. doi:10.1002/bit.20281

      Hallem, E. A., & Sternberg, P. W. (2008). Acute carbon dioxide avoidance in Caenorhabditis elegans. Proc Natl Acad Sci U S A, 105(23), 8038-8043. doi:10.1073/pnas.0707469105

      Ishii, T., Miyazawa, M., Onouchi, H., Yasuda, K., Hartman, P. S., & Ishii, N. (2013). Model animals for the study of oxidative stress from complex II. Biochim Biophys Acta, 1827(5), 588-597. doi:10.1016/j.bbabio.2012.10.016

      Jia, J., Wang, Z., Zhang, M., Huang, C., Song, Y., Xu, F., . . . Cheng, J. (2020). SQR mediates therapeutic effects of H(2)S by targeting mitochondrial electron transport to induce mitochondrial uncoupling. Sci Adv, 6(35), eaaz5752. doi:10.1126/sciadv.aaz5752  

      Kodama-Namba, E., Fenk, L. A., Bretscher, A. J., Gross, E., Busch, K. E., & de Bono, M. (2013). Crossmodulation of homeostatic responses to temperature, oxygen and carbon dioxide in C. elegans. PLoS Genet, 9(12), e1004011. doi:10.1371/journal.pgen.1004011

      Lennicke, C., & Cocheme, H. M. (2021). Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell, 81(18), 3691-3707. doi:10.1016/j.molcel.2021.08.018

      McGrath, P. T., Rockman, M. V., Zimmer, M., Jang, H., Macosko, E. Z., Kruglyak, L., & Bargmann, C. I. (2009). Quantitative mapping of a digenic behavioral trait implicates globin variation in C. elegans sensory behaviors. Neuron, 61(5), 692-699. doi:10.1016/j.neuron.2009.02.012

      Morra, M. J., & Dick, W. A. (1991). Mechanisms of h(2)s production from cysteine and cystine by microorganisms isolated from soil by selective enrichment. Appl Environ Microbiol, 57(5), 14131417. doi:10.1128/aem.57.5.1413-1417.1991

      Murakami, M., Koga, M., & Ohshima, Y. (2001). DAF-7/TGF-beta expression required for the normal larval development in C-elegans is controlled by a presumed guanylyl cyclase DAF-11. Mechanisms of Development, 109(1), 27-35. doi:Doi 10.1016/S0925-4773(01)00507-X

      Naim, N., Amrit, F. R. G., Ratnappan, R., DelBuono, N., Loose, J. A., & Ghazi, A. (2021). Cell nonautonomous roles of NHR-49 in promoting longevity and innate immunity. Aging Cell, 20(7). doi:ARTN e13413 10.1111/acel.13413

      Ochi, R., Dhagia, V., Lakhkar, A., Patel, D., Wolin, M. S., & Gupte, S. A. (2016). Rotenone-stimulated superoxide release from mitochondrial complex I acutely augments L-type Ca2+ current in A7r5 aortic smooth muscle cells. Am J Physiol Heart Circ Physiol, 310(9), H1118-1128. doi:10.1152/ajpheart.00889.2015  

      Onukwufor, J. O., Farooqi, M. A., Vodickova, A., Koren, S. A., Baldzizhar, A., Berry, B. J., . . . Wojtovich, A. P. (2022). A reversible mitochondrial complex I thiol switch mediates hypoxic avoidance behavior in C. elegans. Nat Commun, 13(1), 2403. doi:10.1038/s41467-022-30169-y

      Oshins, C., Michel, F., Louis, P., Richard, T. L., & Rynk, R. (2022). Chapter 3 - The composting process. In R. Rynk (Ed.), The Composting Handbook (pp. 51-101): Academic Press.  

      Patange, O., Breen, P., Arsuffi, G., & Ruvkun, G. (2025). Hydrogen sulfide mediates the interaction between C. elegans and Actinobacteria from its natural microbial environment. Cell Reports, 44(1), 115170. doi:10.1016/j.celrep.2024.115170

      Patten, D. A., Germain, M., Kelly, M. A., & Slack, R. S. (2010). Reactive oxygen species: stuck in the middle of neurodegeneration. J Alzheimers Dis, 20 Suppl 2, S357-367. doi:10.3233/JAD-2010100498

      Ramsay, R. R., & Singer, T. P. (1992). Relation of superoxide generation and lipid peroxidation to the inhibition of NADH-Q oxidoreductase by rotenone, piericidin A, and MPP+. Biochem Biophys Res Commun, 189(1), 47-52. doi:10.1016/0006-291x(92)91523-s

      Rodriguez-Kabana, R., Jordan, J. W., & Hollis, J. P. (1965). Nematodes: Biological Control in Rice Fields: Role of Hydrogen Sulfide. Science, 148(3669), 524-526. doi:10.1126/science.148.3669.524

      Romanelli-Cedrez, L., Vairoletti, F., & Salinas, G. (2024). Rhodoquinone-dependent electron transport chain is essential for Caenorhabditis elegans survival in hydrogen sulfide environments. J Biol Chem, 300(9), 107708. doi:10.1016/j.jbc.2024.107708

      Van Gilst, M. R., Hadjivassiliou, H., & Yamamoto, K. R. (2005). A Caenorhabditis elegans nutrient response system partially dependent on nuclear receptor NHR-49. Proc Natl Acad Sci U S A, 102(38), 13496-13501. doi:10.1073/pnas.0506234102

      Wani, K. A., Goswamy, D., Taubert, S., Ratnappan, R., Ghazi, A., & Irazoqui, J. E. (2021). NHR- 49/PPAR-α and HLH-30/TFEB cooperate for   host defense via a flavin-containing monooxygenase. Elife, 10. doi:ARTN e62775 10.7554/eLife.62775

      Yang, W., & Hekimi, S. (2010). A mitochondrial superoxide signal triggers increased longevity in Caenorhabditis elegans. PLoS Biol, 8(12), e1000556. doi:10.1371/journal.pbio.1000556

      Zorov, D. B., Juhaszova, M., & Sollott, S. J. (2014). Mitochondrial reactive oxygen species (ROS) and ROS-induced ROS release. Physiol Rev, 94(3), 909-950. doi:10.1152/physrev.00026.2013

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The authors investigated the role of the zinc transporter ZIP10 in regulating zinc sparks during fertilization in mice. By utilizing oocyte-specific Zip6 and Zip10 conditional knockout mice, the authors effectively demonstrate the importance of ZIP10 in zinc homeostasis, zinc spark generation, and early embryonic development. The study is overall useful as it identifies ZIP10 as an important component of oocyte processes that support embryo development, thus opening the door for further investigations. While the study provides solid evidence for the requirement of ZIP10 in the regulation of zinc sparks and zinc homeostasis, it falls short of revealing the underlying mechanism of how ZIP10 exerts this important function.

      This report is the first to clarify the role of the zinc transporters ZIP10 expressed in oocytes, which was previously unknown, and does not focus on the detailed mechanism. As you pointed out, we believe that the mechanism will also be important information in the field of fertilization and embryogenesis research, and we believe that it is necessary to consider this issue in the future.

      (1) The zinc transporters the authors are knocking out are expressed in mouse oocytes through follicular development, and the Gdf9-cre driver used means these oocytes were grown in the absence of appropriate Zinc signaling. Thus, it would be difficult to assert that the lack of fertilization associated with zinc sparks is solely responsible for the failure of embryo development. Spindle morphology and other meiotic parameters do not necessarily report oocyte health, so normalcy of these features may not be a strong argument when it comes to metabolic issues.

      As you rightly observe, the results of this study do not entirely exclude the possibility of oocyte health in the absence of adequate zinc homeostasis during oocyte growth. However, evidence has been presented demonstrating that spindle formation does occur in Zip10<sup>d/d</sup> mouse oocytes (Fig.2 C), that fertilization occurs despite the absence of zinc spark (Fig.3 and Fig. 4A), and that some embryos develop to blastocysts (Fig. 4 B). We believe that future studies should evaluate the transcriptome profile of Zip10<sup>d/</sup> mouse oocytes.

      (2) While comparing ZIP6 and ZIP10 in the abstract provides context, focusing more on ZIP10 would improve reader comprehension, as ZIP10 is the primary focus of the study. Emphasizing the specific role of ZIP10 will help the reader grasp the core findings more clearly.

      Thank you for your valuable input. We have revised the summary to focus more on ZIP10 by removing the section in the summary that mentions ZIP6 (P.1-2 Line 34-52).

      (3) Zinc transporters ZIP6 and ZIP10 are expressed during follicular development, but the biological significance of the observation is not clearly addressed. The authors should investigate whether the ZIP6 and ZIP10 knockout affects follicular development and discuss the potential implications.

      Thank you for your valuable input. As you mentioned, we have not been able to clarify the effects of ZIP6 and ZIP10 knockout on follicle formation. However, this report clarifies the role of ZIP-mediated zinc ions in their inclusion. The effect of ZIP knockout on follicle formation will be discussed in the future.

      (4) In Figure 3, the zinc fluorescence images are unclear, making it difficult for readers to interpret the data. Including snapshot images of calcium and zinc spikes as part of the main figure would improve clarity. Moreover, adding more comparative statements and a deeper explanation of why Zip10 KO mice exhibit normal calcium oscillations but lack zinc sparks would strengthen the manuscript.

      Thank you for your suggestion. We have also added images of calcium elevation after fertilization to Fig. 3 and Fig. S3. In addition, the figure legends have been changed (P.29 Line 937-939, P.34 Line 1104-1106). As to why Zip10 KO mice show normal calcium oscillations but lack zinc spikes, as mentioned in Discussion (P. 10 Line 299-300), we speculate that zinc ions existed in Zip10d/d mouse oocytes induce Ca2+ release without compromising IP3R1 sensitivity. We also assume that the lack of zinc spark is due to low accumulation of zinc ion levels in the oocytes via ZIP10, as described in Discussion (P.10 Line 300-302).

      (5) While the study identifies the role of ZIP10 in zinc spark generation, it lacks a clear mechanistic insight. The topic itself is interesting, but without providing a more detailed explanation of the underlying mechanisms, the study leaves an important gap. Further discussion on the signaling pathways potentially involved in zinc spark regulation would add depth to the findings.

      Thank you for your input. This report is the first to clarify the role of the zinc transporters ZIP6 and ZIP10 expressed in oocytes, which was previously unknown, and does not focus on the detailed mechanism. As you pointed out, we believe that the mechanism and signaling pathways will also be important information, and we believe that it is necessary to research this issue in the future.

      Reviewer #2 (Public review):

      Summary:

      In this important study, the authors examine the role of two zinc uptake transporters, Zip6 and Zip10, which are important during the maturation of oocytes, and are critical for both successful fertilization and early embryogenesis.

      Strengths:

      The authors report that oocytes from Zip10 knockout mice exhibit lower labile zinc content during oocyte maturation, decreased amounts of zinc exocytosis during fertilization, and affect the rate of blastocyst generation in fertilized eggs relative to a control strain. They do not observe these changes in their Zip6 knockout animals. The authors present clear and well-documented results from a broad range of experimental modalities in support of their conclusions.

      Thank you for your positive comments.

      Weaknesses:

      (1) The authors' statement that Zip10 is not expressed in the oocyte nuclei (line 252). Furthermore, in that study, ZIP10 was detected in the nuclear/nucleolar positions of oocytes of all follicular stages (Chen et al., 2023), which we did not observe. This is not supported by Figure 1, where some Zip10 signal is apparent in the primordial, primary, and secondary follicle oocytes. This statement should be corrected.

      Thank you for pointing this out. Our results of ISH staining (Fig. 1A) and immunofluorescence staining (Fig. 1B) showed that it was not detected at the nucleus/nucleolus location. In other words, they could not be detected at the mRNA and protein levels. Based on the results of ISH staining and immunofluorescence staining, we conclude that it is expressed in the plasma membrane.

      (2) Based on the FluoZin-3AM data, there appears to be less labile zinc in the Zip10d/d oocyte, eggs, and embryos; however, FluoZin-3AM has a number of well-known artifacts and does not accurately capture the localization of labile zinc pools. The patterns do not correspond to the well-documented zinc-containing cortical vesicles. Another zinc probe, such as ZinPyr-4 or ZincBY-1 should be used to visualize the zinc vesicles and confirm that there is less labile zinc in these locations as well.

      Thank you for your suggestion. Previous studies (Lisle et al., 2013, Reproduction) and our report (Kageyama et al., 2022, Animal Science Journal) have shown that it is possible to examine the presence of labile zinc ions in oocytes and embryos. In addition, mouse oocytes (embryos) reported in previous studies are from CD1 (ICR) mice, whereas our study was conducted using C57BL/6J mice. In our report (Kageyama et al., 2024, Journal of Reproduction and Development), we reported that the appearance of zinc vesicles in the oocytes observed by Fluozin-3AM staining in CD1 and C57BL/6J mice is different, and we believe that this appearance of cortical vesicles in C57BL/6J mice is not a problem. As you say, we have not used other zinc probes and will consider this in the future.

      (3) Line 268 The results indicate that ZIP10 is mostly responsible for the uptake of zinc ions in mouse oocytes. The situation seems a bit more complicated given that the differences in labile zinc content between oocytes from the WT and Zip10d/d animals are small (only 20-30 %) and that the zinc spark is diminished but still apparent at a low level in the Zip10d/d oocytes. Clearly, other factors are involved in zinc uptake at these stages. A variety of studies have suggested that Zip6 and Zip10 work together, perhaps even functioning as a heterodimer in some systems. The double KO would address this more clearly, but if it is not available, it might be more prudent to state that Zip10 plays some role in uptake of zinc in mouse oocytes while the role of Zip6 remains uncertain.

      We would like to express our gratitude for the comments received. The phenotype of double knockout mice for ZIP6 and ZIP10 will be discussed at a future date. We have also added to the text that the role of ZIP6 remains uncertain (P. 11 Line 353-354).

      (4) Zip6d/d oocytes did not have changes in labile zinc, nor did the lack of Zip6 have an impact on the zinc spark. However, Figure S1 does show a small amount of detectable Zip6 in the western blot. It is possible that this small amount could compensate for the complete lack of Zip6. Can ZIP6 be found in immunofluorescence of GV oocytes or MII eggs from the Zip6d/d animals? Additionally, it is possible that Zip6's role is only supplementary to that of Zip10. The authors should discuss this possibility. It would also be interesting to see if the Zip6/Zip10 double knockout displays greater defects compared to the Zip10 knockout when considering previous studies.

      Thank you for your input. The mice are deficient in the gene so that ZIP6 is not functional. It is our notion that the results of WB analysis are not indicative of protein structural functionality, even in cases where the ZIP6 antibody detects a small amount of protein. Since the role of ZIP6 was not elucidated in this study, we added a statement to that effect in the text (P. 11 Line 353-354). In addition, studies using ZIP6/Zip10 double knockout mice will be discussed in the future.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      We have revised the text based on the reviewerʼs suggestions.

      Reviewer #1 (Recommendations for the authors):

      (1) In lines 133-136, it seems that the authors would like to aim to emphasize the lack of research on oocytes compared to other tissues and cells. However, the inclusion of unrelated contexts, such as the role of ZIP10 in cancer and skin, appears unnecessary and detracts from the focus on oocyte-specific mechanisms. Removing these unrelated sentences would help maintain clarity and relevance in the introduction.

      *As you indicated, we removed the sentence that is not related to oocytes (P.4 Line 120-125). Further, they reported that targeted disruption using Zip6- and Zip10- specific morpholino injection or antibody incubation induced alteration of the intracellular labile zinc content, spontaneous resumption of meiosis from the PI arrest and premature arrest at a telophase I-like state (Kong et al., 2014). It is clear from these reports that ZIP6 and ZIP10 are involved in zinc transport in oocytes, but the function is not elucidated.”

      (2) Ensure that all video files are properly labeled to enhance understanding.

      Improved video labels for clarity (Movie 1-8, Movie S1-S4)

      (3) Correct mislabeling issues, such as the one in line 209.

      Corrected as follows: Zip10<sup>d/d</sup> mouse oocytes can be fertilized but were unlikely to develop to blastocysts (P. 6-7 Line 196-197).

      (4) In Figure 4D, the amount of ZIP2 appears to increase relative to actin. Including quantification would make the data more robust. Similarly, in Figure 4F, JUNO levels appear increased in Zip10 KO. Please provide quantification.

      The WB band images in Fig. 4D were quantified and their graphs were added to lower part of Fig. 4D. Furthermore, the Juno of Immunofluorescent images in Figure 4F were quantified and their graphs were added to Fig. S4. Figure legends and text were corrected and added.P. 30 Line 975-979: Expression level of β-actin serves as a protein loading control and quantified the expression level of ZP2. Molecular mass is indicated at the left. Statistical differences were calculated according to the one-way ANOVA. Different letters represent significant differences (p < 0.05).

      P. 35 Line: Fig. S4 Comparison of JUNO expression in Zip10<sup>f/f</sup> and Zip10<sup>d/d</sup> mouse MII oocytes. To measure JUNO-immunofluorescence intensity, oocytes images were selected as regions of interest (ROIs) and measured using ImageJ. Statistical differences were calculated according to student’s t-test (p > 0.05; no significant difference).P.7 Line 206-209: As for the expression of JUNO, it had the same expression than between null and control oocytes (Fig. S4) and the temporal dynamics of its disappearance from the cortex after fertilization was similar for both Zip10<sup>f/f</sup> and Zip10<sup>d/d</sup> groups (Fig. 4F).

      (5) Some of the sentences lack proper references.

      The entire text was reviewed and references inserted where necessary.

      P.7 Line 221, P.7 Line222-223, P.8 Line 253-254, P.12 Line 358-360 and P.24 Line 698-699.

      Reviewer #2 (Recommendations for the authors):

      Revisions are warranted in order to address the issues noted in the Weaknesses section of the Public Review. 

      Thank you for your comments, we have individually addressed the areas you pointed out in the Weaknesses section. The following text has also been corrected and edited.

      (1) Line 247 "In primordial follicles, the ooplasmic staining of ZIP10 we anticipate corresponds to ooplasmic vesicular sites. 

      The text of P. 8 Line 230-232 was revised as follows.

      "In primordial follicles, the ooplasm staining of ZIP10 we anticipate corresponds to ooplasmic vesicular sites.

      (2) Line 926 "ZP2 was not stained in primordial follicle, but primary, secondary, and antral follicles stained. FOXL2 was observed in granulosa cells in 928 of all stage follicles. The scale bar represents 20 μm of primordial-secondary follicle and 150 μm of antral follicle." All three sentences have grammar issues that should be fixed. 

      The text of p.28 Line 908-911 was revised as follows.

      It was observed that ZP2 was not present in the primordial follicle; however, it was present in the primary, secondary and antral follicles. Furthermore, FOXL2 was observed at granulosa cells of all stage follicles. Scale bar: 20 µm (primordial, primary and secondary follicle); 150 µm (antral follicle).

    1. Author response:

      Response to Reviewer 1:

      Ad (2) Clinical applications of SANDI have primarily focused on Multiple Sclerosis. However, since the preparation of the manuscript, one study has been published reporting reductions in apparent soma density and white and grey matter differences in apparent soma size in amyotrophic lateral sclerosis (ALS) (https://doi.org/10.1016/j.ejrad.2025.111981). We will include this paper in our revised manuscript.

      Responses to Reviewer 2:

      Strength:

      Ad (3) SANDI cannot directly differentiate between neural and glia cells but the pattern of differences in the SANDI parameters we observed in Huntington’s disease (HD) are consistent with the known pathology in HD.

      Weaknesses:

      Ad (1) With regards to the question about scanner and acquisition consistency, we can confirm that all diffusion data of individuals with HD and healthy controls from the WAND study were acquired with the same multi-shell High Angular Resolution Diffusion Imaging (HARDI) protocol on the 3T Connectom scanner at CUBRIC. Thus, all diffusion data analysed and reported in this manuscript were acquired with the same protocol on the same strong gradient MRI system for harmonization and consistency purposes.

      We agree that for clinical adoption it is important to demonstrate that HD-related SANDI differences do not require ultra-strong gradient imaging and can be detected on standard clinical MRI systems. While we have not collected such data in people with HD, we and others have demonstrated the feasibility of modelling SANDI metrics from multi-shell diffusion-weighted imaging data acquired with maximum b-value 3,000 s/mm2 on clinical 3T MRI system in typical adults and people with MS or ALS (https://doi.org/10.1002/hbm.26416, https://doi.org/10.1038/s41598-024-60497-6, https://doi.org/10.1016/j.ejrad.2025.111981). These studies have demonstrated that it is feasible to characterise brain microstructural differences with SANDI on clinical scanners and that comparable patterns of results can be observed across different MRI systems. It should also be noted that there is presently a move towards stronger gradient implementation in clinical systems as demonstrated by the release of the Siemens Cima.X system which will allow higher b-value diffusion scanning on clinical systems. 

      ad (2) We agree that due to the small number of HD participants with HD-ISS staging the exploratory comparisons between ISS stages need to be interpreted with caution. We hope to gain access to some of the missing ISS information and plan to include these in the revised paper.

      Ad (3) With regards to the queries about the regression modelling choices:

      (1) As SANDI metrics differed between HC and HD groups, and hence may not be directly comparable, separate regression models for HC and HD data were conducted without formal comparisons between slopes. Only descriptive exploratory comparisons of the observed pattern were included.

      (2) We will provide cross-correlational analyses between all SANDI parameters in the supplements of the revised version of the paper to check for multicollinearity.

      (3)All model-based approaches, including SANDI, may be prone to model instability or parameter degeneracy and we will acknowledge and discuss this in the revised version.

      Responses to Reviewer 3:

      Weaknesses: 

      Ad (1) and (2) The effect sizes (ES) of group differences in SANDI, DTI, and volume measures in the caudate and putamen (Tables 3 and 4) were broadly comparable: apparent soma radius rs (rrb = 0.45 -0.53), apparent soma size fis (rrb = 0.32 -0.45), FA (rrb = 0.38 -0.55), MD (rrb = 0.51 -0.61) and volumes (rrb = 0.49 -0.55 ). Similar ES were observed between fis and FA, and between rs and volumes. MD showed the largest ES, likely due to striatal atrophy-related CSF partial volume contamination.Cost-benefit analyses of imaging marker choices in clinical trials depend on the aim of the study. DTI provides sensitive but unspecific indices that are influenced by biological and geometrical tissue properties and capture a multitude of microstructural properties. Similarly, volumetric measurements do not inform about the underpinning neurodegenerative processes.

      With the advancement of disease-modifying therapies for HD it has become important to identify non-invasive imaging markers that can inform about the mechanistic effects of novel therapies. While DTI and volume metrics are sensitive to detect brain changes, they do not provide specific information about the underpinning tissue properties. Such information, however, may turn out to be important for the evaluation of mechanistic effects of novel therapeutics in clinical trials. Advanced microstructural models such as SANDI may help provide such information. We found that SANDI indices had statistically similar power to the gold standard measures of volumes, but with the added value of information underpinning microstructure. We and others have also shown that SANDI can be applied to multi-shell diffusion data acquired in a clinically feasible time (~10 min) on standard 3T MRI systems (please refer to our response above).

      To summarise, DTI and volumes are sensitive to brain changes but will need to be complemented by more advanced microstructural measurements such as SANDI to gain a better understanding of the underlying tissue changes and effects of disease-modifying therapies.

      Ad (3) We will provide a correlation matrix of all DWI measures in supplementary material to allow a better understanding how similar SANDI measures are to each other and compared to DTI measures. 

      Ad (4) Most of the people with HD who have taken part in our study were participants in the Enroll-HD study. We will use HD-ISS information from ENROLL as much as possible. As we do not have longitudinal imaging data for all individuals classified as ISS <2, we will compare our cross-sectional striatal volumes with those from age and sex matched individuals from WAND to determine whether people fall into ISS 0 or 1 category. This approach will hopefully allow us to increase the total HD-ISS sample size and to determine whether there were participants with ISS 0 in our sample.

      Ad (5) We will explain in the revised manuscript that ISS stages are created for research only purposes and are not used or applied in clinic, while “premanifest” and “manifest” are helpful concepts in the clinical context. We will clarify that we refer to individuals without motor symptoms as assessed with Total Motor Score (TMS) as premanifest and to those with motor symptoms as manifest. This roughly corresponds to individuals at ISS 0/1 without signs of motor symptoms compared to individuals at ISS 2-3 with signs of motor symptoms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The small conductance calcium-activated potassium channel 2 (SK2) is an important drug target for treating neurological and cardiovascular diseases. However, structural information on this subtype of SK channels has been lacking, and it has been diOicult to draw conclusions about activator and inhibitor binding and action in the absence of structural information.

      Here the authors set out to (1) determine the structure of the transmembrane regions of a mammalian SK2 channel, (2) determine the binding site of apamin, a historically important SK2 inhibitor whose mode of action is unclear, and (3) use the structural information to generate a novel set of activators/inhibitors that selectively target SK2.

      The authors largely achieved all the proposed goals, and they present their data clearly.

      Unable to solve the structure of the human SK2 due to excessive heterogeneity in its cytoplasmic regions, the authors create a chimeric construct using SK4, whose structure was previously solved, and use it for structural studies. The data reveal a unique extracellular structure formed by the S2-S3 loop, which appears to directly interact with the selectivity filter and modulate its conductivity. Structures of SK2 in the absence and presence of the activating Ca2+ ions both possess non-K+-selective/conductive selectivity filters, where only sites 3 and 4 are preserved. The S6 gates are captured in closed and open states, respectively. Apamine binds to the S2-S3 loop, and unexpectedly, induces a K+ selective/conductive conformation of the selectivity filter while closing the S6 gate.

      Through high-throughput screening of small compound libraries and compound optimization, the group identified a reasonably selective inhibitor and a related compound that acts as an activator. The characterization shows that these compounds bind in a novel binding site. Interestingly, the inhibitor, despite binding in a site diOerent from that of apamine, also induces a K+ selective/conductive conformation of the selectivity filter while the activator induces a non-K+ selective/conductive conformation and an open S6 gate.

      The data suggest that the selectivity filter and the S6 gate are rarely open at the same time, and the authors hypothesize that this might be the underlying reason for the small conductance of SK2. The data will be valuable for understanding the mechanism of SK2 channel (and other SK subtypes).

      Overall, the data is of good quality and supports the claims made by the authors. However, a deeper analysis of the cryo-EM data sets might yield some important insights, i.e., about the relationship between the conformation of the selectivity filter and the opening of the S6 gate.

      We attempted focused 3D classification to identify subsets of particles with the S6 open and the SF in a conductive state but were not able to isolate such a particle class. This indicates that either none or a very small percentage of particles exists in a fully conductive state. This sentence was included in the results section: 

      “Focused 3D classification of the S3-S4 linker was unsuccessful in identifying particles subsets with a dilated extracellular constriction suggesting that either none or a very small percentage of Ca<sup>2+</sup>-bound SK2-4 is in a conductive state”

      Some insight and discussion about the allosteric networks between the SF and the S6 gate would also be a valuable addition.

      The extracellular constriction is in the same non-conductive conformation in the Ca<sup>2+</sup> bound and Ca<sup>2+</sup> -free SK2-4 structures suggesting that the conformation of S3-S4 linker/SF and the S6 are not allosterically coupled. We predict that Ca<sup>2+</sup> opens the intracellular gate and another physiological factor (not yet identified) promotes extracellular gate opening. These sentences were added to the results and discussion: “This along with the similar conformation of the S3-S4 linker in the Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free states of SK2-4 suggest that Ca<sup>2+</sup> -dependent intracellular gate dynamics are not coupled to the conformation of the S3-S4 linker. Other yet to be identified physiological factors may be required to dilate the extracellular constriction.”

      “Alternatively, other physiological factors, such as PIP2[46,47] or protein-protein interactions[48-50], may exist in live cells that modulate the interaction between S3-S4 linker and the selectivity filter.”

      Reviewer #2 (Public review):

      Summary:

      The authors have used single-particle cryoEM imaging to determine how small-molecule regulators of the SK channel interact with it and modulate their function.

      Strengths:

      The reconstructions are of high quality, and the structural details are well described.

      Weaknesses:

      The electrophysiological data are poorly described. Several details of the structural observations require a mechanistic context, perhaps better relating them to what is known about SK channels or other K channel gating dynamics.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      The most pressing point I have to make, which could help improve the manuscript, relates to the selectivity filter (SF) conformation. Whether the two ion-bound state of SK2-4 (Figure 4A) represents a non-selective, conductive SF occluded by F243 or represents a C-type inactivated SF, further occluded by F243, is unclear. It would be important to discuss this. Reconstructions of Kv1.3 channels also feature a similar configuration, which has been correlated to its accelerated C-type inactivation.

      Structural overlays of Ca<sup>2+</sup> bound SK2-4, HCN, and C-type inactivated Kv1.3 selectivity filters demonstrate that each have conformational diVerences and it is diVicult to definitively determine if the SK2-4 selectivity filter is in a non-selective conformation like HCN or a C-type inactivated conformation like Kv1.3. Based on the number of ions observed in the filter and the position of Tyr361 we believe the selectivity filter most closely resembles that of HCN. Importantly, the selectivity filter conformation observed in the SK2-4 Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free structures is ultimately nonconductive due to the Phe243 extracellular constriction blocking K<sup>+</sup> eVlux. 

      A comparison of the SK2-4 selectivity filter to HCN and C-type inactivated Kv1.3 was included in Figure 4 and this sentence was included in the results section:

      “The selectivity filter of SK2-4 resembles that of to HCN in both the position of Tyr361 and the number of K<sup>+</sup> coordination sites (Fig 4E,F,G,H)”

      Furthermore, binding of a toxin derivative to Kv1.3 restores the SF into a conductive form, though occluded by the toxin. It appears that apamin binding to SK2-4 might be doing something similar. Although I am not sure whether SK channels undergo C-type inactivation like gating, classical MTS accessibility studies have suggested that dynamics of the SF might play a role in the gating of SK channels. It would be really useful (if not essential) to discuss the SF dynamics observed in the study and relate them better to aspects of gating reported in the literature.

      Extracellular toxin binding to SK2-4 and K<sub>v</sub>1.3 induce a conformational change in the selectivity filter to produce a canonical K<sup>+</sup> selective structure with four coordination sites. However, the mechanism by which the toxins produce the conformational change is diVerent. For SK2-4, apamin interacts primarily with S3-S4 linker residues and induces a shift in the S3-S4 linker away from the pore axis. This in turn prevents the hydrogen bonds between Arg240 and Tyr245 of the S3-S4 linker and Asp363 at the C-terminus of the selectivity filter to produce a selectivity filter conformation with four K<sup>+</sup> coordination sites. For K<sub>v</sub>1.3, the sea anemone toxin ShK binds directly to the C-terminus of the selectivity filter disrupting interactions required for the C-type inactivated structure and thereby inducing the conformational change. These sentences were added to the results:

      “Toxin induced selectivity filter conformational change has also been reported for K<sub>v</sub 1.3 with the sea anemone toxin ShK. However, unlike apamin binding to SK2-4, ShK binds directly to the K<sub>v</sub> 1.3 selectivity filter to convert a C-type inactivated conformation to a canonical K<sup>+</sup> selective structure with four coordination sites [39,40]. The change in selectivity filter conformation in apamin-bound SK2-4 seems to be driven instead by the weakening of interactions between the selectivity filter and the S3-S4 linker.”

      The SF of K channels, in conductive states, are usually stabilized by an H-bond network involving water molecules bridged to residues behind the SF (D363 in the down-flipped conformation and Y361). Considering the high quality of the reconstructions, I would suspect that the authors might observe speckles of density (possibly in their sharpened map) at these sites, which overlap with water molecules identified in high-resolution X-ray structures of KcsA, MthK, NaK, NaK2K, etc. It could be useful to inspect this region of the density map.

      We did not observe strong density near Y361 or D363 that could be confidently model as water. However, in the structures of SK2-4 bound to apamin and compound 1 Tyr361 in the selectivity filter rotates 180° and forms a hydrogen bond with Thr355 in the pore helix. The homologous hydrogen bond is also observed in SK4 and the conductive/ K<sup>+</sup> selective selectivity filter conformation of Kv1.3.  The rotation of Tyr361 to form a hydrogen bond with Thr355, reorientation of Asp363 and Trp350 into hydrogen bonding position, and the presence of four K<sup>+</sup> coordination sites upon binding of apamin and compound 1 strongly suggest that the selectivity filter is in a K<sup>+</sup> selective/conductive conformation. The Tyr361/Thr355 hydrogen bond is now described in the paper and shown in Figures 4D, 5D, and S6F.

      Reviewer #3 (Public review):

      This is a fundamentally important study presenting cryo-EM structures of a human small conductance calcium-activated potassium (SK2) channel in the absence and presence of calcium, or with interesting pharmacological probes bound, including the bee toxin apamin, a small molecule inhibitor, and a small molecule activator. As eOorts to solve structures of the wild-type hSK2 channel were unsuccessful, the authors engineered a chimera containing the intracellular domain of the SK4 channel, the subtype of SK channel that was successfully solved in a previous study (reference 13). The authors present many new and exciting findings, including opening of an internal gate (similar to SK4), for the first time resolving the S3-S4 linker sitting atop the outer vestibule of the pore and unanticipated plasticity of the ion selectivity filter, and the binding sites for apamin, one new small molecule inhibitor and another small molecule activator. Appropriate functional data are provided to frame interpretations arising from the structures of the chimeric protein; the data are compelling, the interpretations are sound, and the writing is clear. This high-quality study will be of interest to membrane protein structural biologists, ion channel biophysicists, and chemical biologists, and will be valuable for future drug development targeting SK channels.

      The following are suggestions for strengthening an already very strong and solid manuscript:

      (1) It would be good to include some information in the text of the results section about the method and configuration used to obtain electrophysiological data and the limitations. It is not until later in the text that the Qube instrument is mentioned in the results section, and it is not until the methods section that the reader learns it was used to obtain all the electrophysiological data. Even there, it is not explicitly mentioned that a series of diOerent internal solutions were used in each cell where the free calcium concentration was varied to obtain the data in Figure1C. Also, please state the concentration of free calcium for the data in Figure 1B.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      (2) The authors do a nice job of discussing the conformations of the selectivity filter they observed here in SK as they relate to previous work on NaK and HCN, but from my perspective the authors are missing an opportunity to point out even more striking relationships with slow C-type inactivation of the selectivity filter in Shaker and Kv1 channels. C-type inactivation of the filter in Shaker was seen in 150 mM K using the W434F mutant (PMC8932672) or in 4 mM K for the WT channel (PMC8932672), and similar results have been reported for Kv1.2 (PMC9032944; PMC11825129) and for Kv1.3 (PMC9253088; PMC8812516) channels. For Kv1.3, C-type inactivation occurs even in 150 mM K (PMC9253088; PMC8812516). Not unlike what is seen here with apamin, binding of the sea anemone toxin (ShK) with a Fab attached (or the related dalazatide) inserts a Lys into the selectivity filter and stabilizes the conducting conformation of Kv1.3 even though the Lys depletes occupancy of S1 by potassium (PMC9253088; PMC8812516). Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 2’s comments for a comparison of the selectivity filter structure between SK2-4 and C-type inactivated K<sub>v</sub>1.3 and a discussion of toxin induced selectivity filter conformational change.

      What is known about how the functional properties of SK2 channels (where the filter changes conformation) diOer from SK4, where the filter remains conducting (reference 13)? Is there any evidence that SK2 channels inactivate?

      Compared with SK4, SK2 has some unique properties such as lower conductance and the ability to switch between low- and high-open probability states. Mutation of Phe243 suggests that the S3-S4 linker conformation contributes to the low conductance. This is included in the discussion.

      “Such a mechanism may explain some properties of SK2 that are not observed in SK4, which lacks an S3-S4 linker, such as its low conductance (~10 pS) and the ability to switch between low- and high-open probability states[3,4]. Indeed, mutation of Phe243 in rat SK2 produced a 2-fold increase in channel conductance[5].”

      Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 1’s comments for a discussion of the potential physiological role of the S3-S4 linker/extracellular constriction and its mechanism for opening.

      Reviewer #1 (Recommendations for the authors):

      I enjoyed reading your paper and am intrigued by your findings on the selectivity filter of SK2. I've got a few recommendations for data analysis and a couple of questions that might contribute to the discussion.

      In your Ca2+-bound dataset, have you tried to parse out any alternative conformations (e.g., by using 3D classification, or 3D variability)? Do you think there might be a small(er) population of particles that adopt a fully open conformation? If you haven't done this already, I would recommend doing so. You have a rather large number of particles in your final 3D reconstruction (~660k), so there might be some hidden conformations that could contribute to our understanding of the system.

      I would recommend doing the same for your compound 4-bound data set.

      Please see above for response to this recommendation.

      Do you think apamine works solely as a pore blocker, or does its binding perhaps also aOect the S6 gate via allosteric networks (perhaps the same ones that induce the formation of the K+ conductive SF through binding of compound 1 above the S6 gate?)?

      Apamin binding does not change the conformation of the pore helices (S5 or S6) and thus we believe it acts primarily as a pore blocker. The following was added to the results section:

      “Overall, the apamin-bound SK2-4/CaM structure resembles Ca<sup>2+</sup>-bound SK2-4. The Nterminal lobe of CaM engages with the S<sub>45</sub> A helix, the S5 and S6 helices adopt a similar conformation, and the intracellular gate Val390 is open with a radius of 3.5 Å (Fig 2D). The most significant conformational change is in the position of the S3-S4 linker, which shifts ~2 Å away from the pore axis to accommodate apamin binding.”

      Is there a mechanistic explanation for why it might be diOicult/energetically costly for the SF to be conductive and the S6 gate to be open at the same time?

      Not to our knowledge.

      I also have these minor recommendations:

      -In all figures showing density, include the threshold/sigma value at which density is shown.

      -For all ligands and ions, include half-map data.

      Sigma values were added for all figures legends displaying cryoEM density. The displayed maps are the sharpened full maps.

      Reviewer #2 (Recommendations for the authors):

      Is it possible to provide a structure-sequence guided explanation for the diOerent aOinity of compound 1 for SK2 vs SK4?

      Yes. The following is now included in the results section and a panel was added to Figure S6D.

      “However, for SK4 Thr212 replaces SK2 Ser318 and Trp216 (homologous to SK2 Trp322) is conserved but adopts a diVerent rotamer conformation (Fig S6D). Both changes occlude the compound 1 binding site in SK4 and would likely reduce compound 1 potency on SK4 as observed in the functional data.”

      Is it possible to propose a model of modulation by compound 1/4 where the authors can comment on the conformational dependence of compound binding? That is, do they bind exclusively to the identified conformational states of the channel, or are they able to bind to both closed and open channels, but bias one state over the other?

      The clash between compound 1 and Thr386 in the open conformation of the S6 helices suggests that compound 1 would preferentially bind to closed state of SK2. Similarly, the clash between compound 4 and Ile380 in the closed conformation of the S6 helices suggests that compound 4 would preferentially bind to the open state of SK2. This was included in the discussion:

      “This proposed mechanism of modulation suggests that compound 1 may bind preferentially to the closed conformation of the S6 helices and compound 4 may bind preferentially to the open conformation of the S6 helices.” 

      Please provide the calcium concentration used to generate the data in Figure 1B. The calcium concentration is now stated in the legend for Fig 1B:

      “Intracellular solution contains 2 µM Ca<sup>2+</sup> based on calculation using Maxchelator (see methods)”

      Essential and critically important descriptions of experiments in Figure 7A are lacking. It would be essential to describe properly, with care, what the currents and the conditions of measurements are. If these currents are obtained by subtracting leak currents by adding other drugs, it would be good to comment on whether the latter compete with compounds 1/4.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification. SK currents were obtained by subtracting leak currents by adding UCL1684 only at the end of experiments. UCL1684 is not expected to interfere with eVect of compound 1 or 4 given diVerent binding sites and mechanisms.  

      If Compound 1 changes the structure of the SF (Figure 6F), would it also promote apamin binding? Given that both these agents produce a similar change in the SF, could each favor the binding of the other?

      Since apamin binds to the S3-S4 linker it is unlikely that the selectivity filter conformational change observed in the compound 1 bound structure would aVect apamin binding.

    1. Author response:

      Reviewer #1:

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry.

      They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      We appreciate the understanding of our work presented in this manuscript by this reviewer.

      The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The four structures with a bound sugar of different sizes aimed to identify the binding motif on both the primary substrate (sugar) and the transporter (MelB<sub>St</sub>). Although the resolutions of the structures complexed with melibiose, raffinose, or a-MG are relatively low, the size and shape of the densities at each structure are consistent with the corresponding sugar molecules, which provide valuable data for determining the pose of the bound sugar. Additionally, there is another a-NPG-bound structure at a higher resolution of 2.7 Å. Therefore, our new data support the published binding site with the galactosyl moiety as the main interacting group. The identified water-1 in this study further confirms the orientation of C4-OH. Notably, this transporter does not recognize or transport glucosides where the orientation of C4-OH at the glucopyranosyl ring is opposite. We will provide stronger data to support the water-1.

      Regarding the raffinose question, we should have clearly introduced the historical background. Bacterial disaccharide transporters have broad specificity, allowing them to work on a group of sugars with shared structural elements; for example, one sugar molecule can be transported by several transporters. As reported in the literature, the galactosides melibiose, lactose, and raffinose can be transported by both LacY and MelB of E. coli. We did not test whether MelB<sub>St</sub> can transport the a-NPG and raffinose. To address this issue and strengthen our conclusions, we plan to conduct additional experiments to gather evidence of the translocation of these sugars by MelB<sub>St</sub>.

      The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

      Previously, we used HDX-MS to examine the conformational transition between inward- and outward-facing conformations using a conformation-specific nanobody to trap MelB<sub>St</sub> in an inward-facing state, as structurally resolved by cryoEM single-particle analysis and published in eLife 2024. That study identified dynamic regions that may be involved in the conformational transitions; however, there was no sugar present. We also solved and published the crystal structure of the apo D59C MelB<sub>St</sub>. The sugar-bound and apo states are virtually identical. To address the positive cooperativity of binding between the sugar and co-transport cations observed in biophysical analysis, in this study, we utilize HDX-MS to analyze the structural dynamics induced by melibiose, Na<sup>+</sup>, or both, focusing on the binding residues at the sugar-binding and cation-binding pockets. The results suggest that the coupling cation stabilizes sugar-binding residues at helices I and V, contributing to affinity but not specificity.

      Since MelB<sub>St</sub> favors the outward-facing conformation, and simulations on the free-energy landscape suggest that the highest affinity of the sugar-bound state is also at an outward-facing state, MelB<sub>St</sub> in both the apo and bound states tend to remain in the outward-facing conformation. We will include a section comparing these differences. Thank you to this reviewer for the critical insight.

      Reviewer #2:

      This manuscript from Hariharan, Shi, Viner, and Guan present x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na<sup>+</sup>. The work presented here builds on years of previous study and adds substantial new details into how Na<sup>+</sup> binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      We thank this reviewer for taking the time to read our previous articles related to this manuscript.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      This figure shows a stereo view of a density map created in cross-eye style to demonstrate its quality. We will update this figure with a higher-resolution map, and the density for Wat-1 is clearly visible. This also addresses Reviewer-3’s comment regarding the map resolution.

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      While raffinose and a-MG have been reported as substrates of MelB in E. coli, binding data are unavailable; additionally, for MelB<sub>St</sub>, we lack data on the binding of two of the four sugars or sugar analogs. We performed a label-free binding assay using ITC to address this concern with the WT MelB<sub>St</sub>. We will also perform the binding assay with the D59C MelB<sub>St</sub>, since sugar binding has been structurally analyzed with this mutant, as pointed out by this reviewer. Along with other new functional results, we will prepare a new Figure 1 on functional analysis, which will also address the comment regarding extra bulk at the non-galactosyl moiety with poor affinity.

      This D59C uniport mutant exhibits increased thermostability, making it a valuable tool for crystal structure determination, especially since the wild type (WT) is difficult to crystallize at high quality. Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this mutant selectively abolishes cation binding and cotransport. However, it still maintains intact sugar binding with slightly higher affinity and preserves the conformational transition, as demonstrated by an electroneutral transport reaction, the melibiose exchange, and fermentation assays with intact cells. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport. We will provide additional details during the revision.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      All authors appreciate this reviewer’s comments and suggestions, which will be incorporated into the revision.

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      Yes, a lipid/detergent removal step was applied in this study and in previous studies and this information was clearly described in Methods.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We will update the Table S2. Thank you.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We will prepare the plots in supplementary information.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We appreciate this comment and will cite this article on the hybrid significance method. We will include volcano plots for each dataset. We fully acknowledge that using a cutoff of P < 0.05 can increase the likelihood of false-positive identifications. However, given the complexity of the samples analyzed in this study, we believe that some important changes may have been excluded due to higher variability within the dataset. By applying multiple levels of statistical testing, we determined that P < 0.05 represents a suitable threshold for this study. The threshold values were marked in the residual plots and explained in the text. For Figure 6, we have revised it by showing the P value directly.

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      Table S4 was created to provide an overall view of the dynamic regions. If we understand correctly, this reviewer asked us to comment on the effect of solvent accessibility or hydrophobic regions on the overall dynamics outside the binding residues of the peptides that carry binding residues. Since the HDX rate is influenced by two linked factors: solvent accessibility and hydrogen-bonding interactions that reflect structural dynamics, poor solvent accessibility in buried regions results in low deuterium uptakes. The peptides in our dataset that include the Na<sup>+</sup>-binding site showed low HDX, likely due to poor solvent accessibility and structural stability. It is unclear what this reviewer meant by "increased dynamics at peptides covering the Na binding site on overall more dynamic helices." We do not observe increased dynamics in peptides covering Na<sup>+</sup>-binding sites.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Thanks for this suggestion. The previous datasets were collected in the presence of Na<sup>+</sup>. In the current study, we also have a Na-containing dataset. Both showed similar results: the multiple overlapping peptides covering the sugar-binding residues on helices I and V have higher HDX rates than those covering the Na<sup>+</sup>-binding residues, even when Na<sup>+</sup> is present in both datasets.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Thanks for this suggestion. Conducting the HDX-MS comparison between the WT and the D59C mutant is certainly interesting, especially given the growing amount of structural and biochemical/biophysical data available for this mutant. However, due to limited resources, we might consider doing it later.

      (8) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      Thanks for this suggestion. We have demonstrated that Li<sup>+</sup> also shows positive cooperativity with melibiose through ITC binding measurements. Li<sup>+</sup> binds to MelB<sub>St</sub> with higher affinity than Na<sup>+</sup> but causes many different effects on MelB. It is worth investigating this thoroughly and individually. To address the second question, H<sup>+</sup> is a poor coupling cation with minimal impact on melibiose binding. Since its pKa is around 6.5, only a small subpopulation of MelB<sub>St</sub> is protonated at pH 7.5. The order of sugar-binding cooperativity is the highest with Na<sup>+</sup>, followed by Li<sup>+</sup> and H<sup>+</sup>.

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      The sugar translocation free-energy landscape simulations showed that both helix bundles move relative to the membrane plane. That analysis aimed to clarify a hypothesis in the field—that the MFS transporter can use an asymmetric mode to transition between inward- and outward-facing states. In the case of MelB, we clearly demonstrated that both domains move and each helix bundle moves as a unit, so the labeling changes were identified only in some extramembrane loops and a few highly flexible helices. Thanks for the suggestion about comparing with XylE. We will include a discussion on it.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Thank this reviewer for reading the single-molecule force spectroscopy (SMFS) study on MelB<sub>St</sub>. The C3 loop mentioned in this SMFS article is partially covered in the dataset Mel or Mel plus Na<sup>+</sup> vs. Apo, and more coverage is in the Na<sup>+</sup> vs. Apo. In either condition, no deprotection was detected. Two possible reasons the HDX data did not reflect the deprotection are: 1) The changes were too subtle and did not pass the statistical tests and 2) the longest labeling time point was still insufficient to detect the changes; much longer labeling times should be considered in future studies.

      Reviewer #3:

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelB<sub>St</sub>) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na⁺, H⁺, or Li⁺, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na<sup>+</sup> by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na<sup>+</sup> and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26 forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na<sup>+</sup>, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Thank this reviewer for your positive comments.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      Thanks for your comments on the resolution. We will improve the density for the Water 1.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      The authors thank this reviewer for the thoughtful suggestions. MelB<sub>St</sub> has been subjected to Cys-scanning mutagenesis (https://doi.org/10.1016/j.jbc.2021.101090). Placing a Cys residue on the hydrogen bond-donor Q372 significantly decreased the transport initial rate, accumulation, and melibiose fermentation, with little effect on protein expression, as shown in Figure 2 of this JBC paper. Although no binding data are available, the poor initial rate of transport with a similar amount of protein expressed suggested that the binding affinity is apparently decreased, supporting the role of water-1 in the binding pocket for better binding. The T373C mutant retained most activities of the WT. We will discuss the functional characterizations of these two mutants. Thanks.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations?

      Thanks for this important question. We will discuss the deprotected data in the conformational transition between inward-facing and outward-facing states. The two regions, loop8-9 and loop1-2, are located in the gate area on both sides of the membrane and showed increased deuterium uptakes upon binding of melibiose plus Na<sup>+</sup>. They are likely involved in this process.

      The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      We will analyze the data as suggested by this reviewer.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      As response to the reviewer #1 at page 3, “Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this mutant selectively abolishes cation binding and cotransport. However, it still maintains intact sugar binding with slightly higher affinity and preserves the conformational transition, as demonstrated by an electroneutral transport reaction, the melibiose exchange, and fermentation assays with intact cells. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport. We will provide additional details during the revision.”.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226), I would recommend indicating more of them in areas where deuterium changes are substantial.

      We appreciate this comment, which will make the plots more meaningful. In the previous article published in eLife (2024), we drew boxed to mark the transmembrane regions; however, it generated much confusion, such as why some helices are very short. The revised figure will label the full length of covered positions.

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      This is an intriguing mechanistic question. Based on current data, we believe that the bound melibiose physically prevents the release of Na<sup>+</sup> or Li<sup>+</sup> from the cation-binding pocket. The cation-binding pocket and surrounding regions, including the sugar-binding residue Asp124, show low HDX, supporting this idea. Since we lack a structure with both substrates bound, figuring out the details structurally is challenging. However, we have a hypothesis about the intracellular Na<sup>+</sup> release as proposed in the 2024 JBC paper (https://doi.org/10.1016/j.jbc.2024.107427). After sugar release, the rotamer change of Asp55 will help Na<sup>+</sup> exit the cation pocket to the sugar pocket, and the negative membrane potential will facilitate the further movement from MelB to the cytosol. We will discuss this during the revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1).  Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra.  While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward.  Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session.  Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate.  Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In  , however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades.  In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision.  We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled.  However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding.  We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group.  In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis.  In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally.  The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)!  In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439).  We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications.  No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above.  Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question.  But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for than you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly

      Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press,Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7.  Thanks for the de Beer REFs.  While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections with in the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness.  For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results.  We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place!  We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species.  For the Fig7 branching and catshark inclusion, please see above. 

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends.  We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends.  That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively.  For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”.  We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”. 

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue.  We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603). 

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      no change; see above

      L53: down tune languish, remove "severely" and "major"

      done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      no change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      no change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      all regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      references to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred.  Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      no change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      no action; see above

      L436: remove paragraph

      no action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study.  (lines 440-453)

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      “I am sorry to dwell on the point of naming the newly identified families of adhesion GPCRs in choanoflagellates. I commented: "Can the authors suggest another scheme (mind to avoid the subfamily I-IX or the alternative ADGRA-G,L,V subfamily schemes of metazoan aGPCRs) and adapt their numbering throughout the text and all figures/supplementary figures/supplementary files." Now the authors have changed the Roman numeral numbering (previously used by the adhesion GPCR field to denominate metazoan receptor families) to the other option that I explicitly said should be obsolete, the numbering by capital letters (which is in use since its introduction in 2015 in Hamann et al., Pharmacol Rev, 2015). The authors write: "Phylogenetic analysis of the 7TM domains of choanoflagellates uncovered at least 19 subfamilies of aGPCRs (subfamilies A-S ...". I am thus afraid this has not addressed my point at all. For example, in the revised numbering scheme for Choanoflagellates aGPCR subfamilies of the authors the now used "A" descriptor, which are predicted to contain a HYR domain, can be mistaken for ADGRA homologs (abbreviated as "A" receptors, previously termed subfamily III aGPCRs) of metazoan aGPCRs, which contain HRM and LRR domains. Likewise, choanoflagellate "E" receptors are predicted to harbour LRR repeats, but metazoan ADGRE (abbreviated as "E" too) are characterised by their EGF domains. This clearly underlines the need to devise a numbering scheme for the newly described choanoflagellate aGPCR homologs so they cannot be confused with the receptors from other kingdoms, for which identical naming conventions exist. Please change this, e.g. by numbering/denominating the choanoflagellate subfamilies by greek letters (or your pick of any other ordering system that does not lend itself to be mistaken with the previous and existing aGPCR classifications) and change the manuscript and figures accordingly.”

      We have now re-labeled the choanoflagellate aGPCR subfamilies, previously numbered from A to S, using Greek alphabetical enumeration (from α to τ). Changes have been made throughout the main text, in Figure 5, and in Supplementary Figures  S6 and S7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Singh, Wu and colleagues explore functional links between septins and the exocyst complex. The exocyst in a conserved octameric complex that mediates the tethering of secretory vesicles for exocytosis in eukaryotes. In fission yeast cells, the exocyst is necessary for cell division, where it localizes mostly at the rim of the division plane, but septins, which localize in a similar manner, are non-essential. The main findings of the work are that septins are required for the specific localization of the exocyst to the rim of the division plane, and the likely consequent localization of the glucanase Eng1 at this same location, where it is known to promote cell separation. In the absence of septins, the exocyst still localizes to the division plane but is not restricted to the rim. They also show some defects in the localization of secretory vesicles and glucan synthase cargo. They further propose that interactions between septins and exocysts are direct, as shown through Alphafold2 predictions (of unclear strength) and clean coIP experiments. 

      Strengths: 

      The septin, exocyst and Eng1 localization data are well supported, showing that the septin rim recruits the exocyst and (likely consequently) the Eng1 glucanase at this location. One major finding of the manuscript is that of a physical interaction between septins and exocyst subunits. Indeed, many of the coIPs supporting this discovery are very clear. 

      Weaknesses: 

      I am less convinced by the strength of the physical interaction of septins with the exocyst complex. Notably, one important open question is whether septins interact with the intact exocyst complex, as claimed in the text, or whether the interactions occur only with individual subunits. The two-hybrid and coIP data only show weak interactions with individual subunits, and some coIPs (for instance Sec3 and Exo70 with Spn1 and Spn4) are negative, suggesting that the exocyst complex does not remain intact in these experiments.

      Given the known structure of the full exocyst complex and septin filaments (at least in S. cerevisiae), the Alphafold2 predicted structure could be used to probe whether the proposed interaction sites are compatible with full complex formation.  

      We thank the reviewer for these important and insightful comments. We agree that our current data, particularly the data from yeast two-hybrid and co-immunoprecipitation (coIP) assays, primarily reveal interactions between individual septin and exocyst subunits, and do not conclusively demonstrate binding of septins to the fully assembled exocyst complex. We realize this as a key limitation and have revised the manuscript text accordingly to clarify this point.

      We also appreciate the reviewer’s suggestion to use structural prediction to further assess their interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group (Mei et al., 2018) to examine the interfaces of septin and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We focused on checking all the interacting residues on the exocyst complex and septins from our AlphaFold modeling to determine whether these predicted interactions are structurally compatible. Our analysis reveals that majority subunit interactions are sterically feasible, while a few would likely require partial disassembly or flexible conformations. These new insights have been added to the revised Results and Discussion sections (Figure Supplement S4, S5 and Videos 4-7).

      While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data support a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript.

      Mei, K., Li, Y., Wang, S., Shao, G., Wang, J., Ding, Y., Luo, G., Yue, P., Liu, J.-J., Wang, X. and Dong, M.-Q., Wang, H-W, Guo W. 2018. Cryo-EM structure of the exocyst complex. Nature Struct & Mol. Biol, 25(2), pp.139-146.

      The effect of spn1∆ on Eng1 localization is very clear, but the effect on secretory vesicles (Ypt3, Syb1) and glucan synthase Bgs1 is less convincing. The effect is small, and it is not clear how the cells are matched for the stage of cytokinesis. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value.

      All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Reviewer #2 (Public Review): 

      Summary: 

      This interesting study implicates the direct interaction between two multi-subunit complexes, known as the exocyst and septin complexes, in the function of both complexes during cytokinesis in fission yeast. While previous work from several labs had implicated roles for the exocyst and septin complexes in cytokinesis and cell separation, this study describes the importance of protein:protein interaction between these complexes in mediating the functions of these complexes in cytokinesis. Previous studies in neurons had suggested interactions between septins and exocyst complexes occur but the functional importance of such interactions was not known. Moreover, in baker's yeast where both of these complexes have been extensively studied - no evidence of such an interaction has been uncovered despite numerous studies which should have detected it. Therefore while exocyst:septin interactions appear to be conserved in several systems, it appears likely that budding yeast are the exception--having lost this conserved interaction. 

      Strengths: 

      The strengths of this work include the rigorous analysis of the interaction using multiple methods including Co-IP of tagged but endogenously expressed proteins, 2 hybrid interaction, and Alphafold Multimer. Careful quantitative analysis of the effects of loss of function in each complex and the effects on localization and dynamics of each complex was also a strength. Taken together this work convincingly describes that these two complexes do interact and that this interaction plays an important role in post Golgi vesicle targeting during cytokinesis. 

      Weaknesses: 

      The authors used Alphafold Multimer to predict (largely successfully) which subunits were most likely to be involved in direct interactions between the complexes. It would be very interesting to compare this to a parallel analysis on the budding yeast septin and exocyst complexes where it is quite clear that detectable interactions between the exocyst and septins (using the same methods) do not exist. Presumably the resulting pLDDT scores will be significantly lower. These are in silico experiments and should not be difficult to carry out. 

      We thank the reviewer for this insightful suggestion. To assess the specificity of the predicted interactions between septins and the exocyst complex in S. pombe, we performed a comparative AlphaFold2 analysis using some of the homologous subunits from Saccharomyces cerevisiae. We modeled two interactions between Cdc10-Sec5 and Cdc10-Sec15 (Cdc10 is the Spn2 homolog) using the same pipeline and parameters at the time when we did the modeling for S. pombe. We did not find interactions between them using the criteria we used for the fission yeast proteins in this study. These results support the notion that the predicted septin–exocyst interactions in S. pombe are not generalizable to budding yeast. Unfortunately, we did not test all other combinations at that time and the AlphaFold2 platform is not available to us now (showing system error messages when we tried recently). We thank the reviewer again for this helpful suggestion, which should strengthen the evolutionary interpretation of the septin-exocyst interactions once it is able to be systematically carried out.

      Reviewer #3 (Public Review): 

      Septins in several systems are thought to guide the location of exocytosis, and they have been found to interact with the exocyst vesicle-tethering complex in some cells. However, it is not known whether such interactions are direct or indirect. Moreover, septin-exocyst physical associations were not detected in several other systems, including yeasts, making it unclear whether such interactions reflect a conserved septin-exocytosis link or whether they may missed if they depend on septin polymerization or association into higher-order structures. Singh et. al., set out to define whether and how septins influence the exocyst during S. pombe cytokinesis. Based on three lines of evidence, the authors conclude that septins directly bind to exocyst subunits to regulate localization of the exocyst and vesicle secretion during cytokinesis. The conclusions are consistent with the data presented, but some interpretations need to be clarified and extended: 

      (1) The first line of evidence examines septin and exocyst localization during cytokinesis in wild-type and septin-mutant or exocyst-mutant yeast. Quantitative imaging convincingly shows that the detailed localization of the exocyst at the division site is perturbed in septin mutants, and that this is accompanied by modest accumulation of vesicles and vesicle cargos. Whether that is sufficient to explain the increased thickness of the division septum in septin mutants remains unclear.

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane (less likely at the rim) without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

      (2) The second line of evidence involves a comprehensive Alphafold2 analysis of potential pair-wise interactions between septin and exocyst subunits. This identifies several putative interactions in silico, but it is unclear whether the identified interaction surfaces would be available in the full septin or exocyst complexes.  

      We thank the reviewer for raising this important point. We fully agree that a key limitation of pairwise AlphaFold predictions is that they do not account for the higher-order structural context of multimeric protein complexes, such as septin hetero-oligomers or the assembled exocyst complex. As a result, some of the predicted interfaces could indeed be conformationally restricted in the native state.

      To address this concern, we predicted the S. pombe exocyst and septin structures using AlphaFold3. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      (3) The third line of evidence uses co-immunoprecipitation and yeast two hybrid assays to show that several physical interactions predicted by Alphafold2 can be detected, leading the authors to conclude that they have identified direct interactions. However, both methods leave open the possibility that the interactions are indirect and mediated by other proteins in the fission yeast extract (co-IP) or budding yeast cell (two-hybrid). 

      We thank the reviewer for this important clarification. We agree that coimmunoprecipitation (co-IP) and yeast two-hybrid (Y2H) assays cannot conclusively distinguish between direct and indirect interactions. As the reviewer points out, co-IPs may reflect associations mediated by bridging proteins within the fission yeast extract, and Y2H readouts can be influenced by fusion context or endogenous host proteins. In our manuscript, we have now revised the relevant statements in the Results and Discussion sections to clarify that the observed associations are consistent with direct interactions predicted by AlphaFold2, but cannot alone establish direct binding. We have also tempered our terminology—substituting phrases such as “direct interaction” with “physical association consistent with direct binding,” where appropriate.

      (4) Based on prior studies it would be expected that the large majority of both septins and exocyst subunits are present in cells and extracts as stoichiometric complexes. Thus, one would expect any septin-exocyst interaction to yield associations detectable with multiple subunits, yet co-IPs were not detected in some combinations. It is therefore unclear whether the interactions reflect associations between fully-formed functional complexes or perhaps between transient folding intermediates. 

      We thank the reviewer for this thoughtful observation. We agree that both septins and exocyst subunits are generally understood to exist in cells as stable, stoichiometric complexes, and that interactions between fully assembled complexes might be expected to yield co-immunoprecipitation signals involving multiple subunits from each complex. However, it was also found that >50% of septins Spn1 and Spn4 are in the cytoplasm even during cytokinesis when the septin double rings are formed (Table 1 of Wu and Pollard, Science 2005, PMID: 16224022). Thus, it is possible that there are pools of free septin and exocyst subunits in the cytoplasm, which were detected in our Co-IP assays. 

      In our experiments, we observed selective co-IP signals between certain septin and exocyst subunits, while other combinations did not yield detectable interactions. We believe these findings could reflect several other possibilities besides the possible interactions among the free subunits in the cytoplasm:

      (1) Some interactions may only be strong enough between specific subunits at exposed interfaces under the Co-IP conditions, rather than through wholesome complex–complex interactions;

      (2) The detergent and/or salt conditions used in our co-IPs may disrupt labile complex interfaces or partially dissociate multimeric assemblies.

      To address this concern, we now include in the Discussion a paragraph highlighting the possibility that some of the observed interactions may not reflect binding between fully assembled, functional complexes. Notably, most detected interactions pairs are consistent with the AlphaFold predictions, which suggest specific subunit interfaces may be responsible for mediating contact. While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data supports a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript. Future biochemical studies using native complex purifications, cross-linking mass spectrometry, or in vitro reconstitution with fully assembled septin and exocyst complexes, or in vivo FRET assays will be essential to clarify whether the interactions we observe occur between intact assemblies or intermediate forms.

      Reviewer #1 (Recommendations for the Authors): 

      A major finding from the manuscript is the description of physical interaction of septin subunits with exocyst subunits. The analysis starts from Alphafold2 predictions, shown in Figures 3 and S3. However, some of the most useful metrics of Alphafold, the PAE plot and the pTM and ipTM values, are not provided. It is thus very difficult to estimate the value of the predicted structures (which are also obscured by all side chains). The power of a predicted structure is that it suggests binding interfaces, which is not explored here. At the very least, it would not be difficult to examine whether the proposed binding interfaces are free in the septin filaments and octameric exocyst complex. 

      Please also see response to reviewer #1 (Public Review).

      We thank the reviewer for these very helpful suggestions. We agree that inclusion of AlphaFold2 model confidence metrics—specifically the Predicted Aligned Error (PAE) plots, as well as pTM and ipTM values—is essential for evaluating the reliability of the predicted septin–exocyst interfaces.

      In the revised manuscript, we have now included the PAE plots (Figure 3 and Supplementary S3) and summarizes the pTM scores for each predicted septin–exocyst subunit pair. We also provide a short description of these metrics in the figure legend to help guide interpretation. The old Alphafold2 version (alphafold2advanced) that we used doesn’t give iPTM score, so are not included. However, according to our methodology, we only counted the interacting residues which have pLDDT scores >50%, predicting the resulting iPTM score should not be very weak.

      In addition, we have updated Figures 3 and S3 to show simplified ribbon diagrams of the interface regions, with side chains hidden by default and selectively displayed only at predicted interaction hotspots. This improves structural clarity and makes the interface regions easier to interpret. We mentioned in the Discussion that the preliminary studies show that the predicted interacting interfaces of Sec15 and Sec5 with septin subunits are accessible for interaction in the whole exocyst complex. The new Figure Supplement S4 and S5 and Videos 4-7 now show the interface residues of both the exocyst and septins that are involved in the interactions.

      Two further points on the interaction: 

      The 2H interaction data is not very convincing. The insets showing beta-gal assays do not look very different from the negative control (compare for instance in panel 4E the Sec15BD alone, last column, with the Sec15-BD in combination with Spn4-AD, third column: roughly same color), which suggests it is mostly driven by autoactivation of Sec15-BD. Providing growth information in addition to beta-gal may be helpful. 

      We appreciate the reviewer’s close evaluation of the yeast two-hybrid (Y2H) assay data, and we agree that the signals observed in the Spn4–Sec15 combination is indeed weak. Unfortunately, we did not perform growth assays. However, we would like to clarify that this is consistent with the nature of the interactions that we are investigating. The interaction between individual septin and exocyst subunits is not strong and/or transient as supported by the weak interactions by Co-IP experiments. Given the exocyst only tethers/docks vesicles on the plasma membrane for tens of seconds before vesicle fusion, the multivalent interactions between septins and the exocyst should be very dynamic and not be too strong. 

      As evidenced by our Co-IP experiments and multivalent interactions predicted by Alphafold2, the interaction between Spn4 and Sec15 is detectable but weak, suggesting that this may be a low-affinity or transient interaction. Given that Y2H assays have known limitations in detecting such low-affinity interactions—especially those that depend on conformational context or are not optimal in the yeast nucleus—it is perhaps not surprising that the X-gal color development is subtle. These limitations of the Y2H system have been well-documented (e.g., Braun et al., 2009; Vidal & Fields, 2014), particularly for interactions with affinities in the micromolar range or those requiring conformational specificity. Therefore, the weak signal observed is in line with expectations for a lowaffinity, transient interaction such as between Spn4 and Sec15.

      Vidal, M. and Fields, S., 2014. The yeast two-hybrid assay: still finding connections after 25 years. Nature methods, 11(12), pp.1203-1206.

      Braun, P., Tasan, M., Dreze, M., Barrios-Rodiles, M., Lemmens, I., Yu, H., Sahalie, J.M., Murray, R.R., Roncari, L., De Smet, A.S. and Venkatesan, K., 2009. An experimentally derived confidence score for binary protein-protein interactions. Nature methods, 6(1), pp.91-97.

      In the coIP experiments, I am confused by the presence of tubulin signal in some of the IPs. For instance, in Fig 4B, but not 4D, where the same Sec15-GFP is immunoprecipitated. There is also a signal in 4C but not 4A. This needs to be clarified. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with well-documented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not as a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      Regarding the localization of Ypt3 and Syb1 in WT and spn1∆ in Figure 6C-D and Bgs1 in Figure 7A, it would help to add a contractile ring marker to be able to match the timing of cytokinesis between WT and mutants and ensure that cells of same stage are compared (and add some quantification for Ypt3). In fact, in Figure 7A, next to the cells being pointed at, there are very similar localizations of Bgs1 in WT and spn1∆ at the rim of the ingressing septum, which makes me wonder how the quantified cells were chosen. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value. All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Finally, the manuscript would benefit from some figure reorganization/compaction. Unless work on the binding interfaces is added, Figure 3 and S3 could be removed and summarized by providing the pTM and ipTM values of the predicted interactions. Figure 5 could be combined with Figure 2, as it is essentially a repeat with additional exocyst subunits. 

      Because the binding interfaces are added, we keep the original Figures 3 and S3. The experiments in Figure 5 could not be performed before the interaction tests between septins and the exocyst. Thus, to aid the flow of the story, we keep Figures 2 and 5 separated.

      Minor comments: 

      The last sentence of the first paragraph of the results does not make much sense at this point of the paper. After the first paragraph, there is no evidence that colocalization would be required for proper function.  

      We agree that the sentence in question may have overstated the functional implications of colocalization too early in the Results section, before presenting supporting evidence. Our intention was to introduce the hypothesis that spatial proximity between septins and exocyst subunits may be relevant for their coordination during cytokinesis, which we examine in later figures. We have revised the sentence to more accurately reflect the observational nature of the data at this stage in the manuscript as below:

      "These observations suggest the spatial proximity between septins and the exocyst during certain stage of cytokinesis, raising the possibility of their functional coordination, which we would further investigate below."

      What is the indicated n in Figure 6B? Number of cells? 

      Yes, the n in Figure 6B refers to the thin sections of electron microscopy quantified in the analysis. We have now updated the figure legend to explicitly state this for clarity.

      The causal inference made between the alteration of Exocyst localization in septin mutants and the thicker septum is possible, but by no means certain. It should be phrased more cautiously. 

      We agree that our original phrasing may have overstated the causal relationship between altered exocyst localization in septin mutants and septum thickening. Our data supports a correlation between these phenotypes, but additional experiments would be required to establish direct causality.

      To reflect this, we have revised the relevant sentence in the Discussion to read:

      “The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum.”

      Reviewer #2 (Recommendations for the Authors): 

      (1) In the display of the AlphaFold Model for the interactions (Figure 3 and Supplemental Figure 3) it is difficult to identify which subunits are where. Residue numbers and subunits should be labeled and only side chains important for the interactions should be present in the model. 

      We appreciate this valuable suggestion. We agree that clearer visual labeling is essential for interpreting the predicted interactions and have revised Figures 3 and S3 accordingly to improve readability and emphasize key structural features.

      Specifically, we have:

      • Labeled each subunit with its name and color-coded consistently across panels.

      •  Annotated key interface residues with residue numbers directly in the figure.

      • Removed non-interacting side chains to declutter the model and highlight only those involved in predicted interactions as well as expanded the figure legend for explanation.

      (2) In Table 1 the column label "Genetic Interaction at 25C" is confusing when synthetic growth defects are shown with a "plus". Rather this column could be labeled "Growth of double mutants at 25C" and then designate the relative growth rate observed at 25C as in Table 2. Designating a negative effect on growth with a plus is confusing. 

      Thanks for the thoughtful suggestions. We have made the suggested changes by deleting the last column so that Tables 1 and 2 are consistent.

      (3) In Figure 4, why is tubulin being co-immunoprecipitated in two of the four anti-GFP IPs? Are the IPs dirty and if so why does it vary between the four experiments? If they are dirty can the non-specific tubulin be removed by additional washes with IP buffer or conversely is it necessary to do minimal washes in order to detect the exocyst-septin interaction by coIP? A comment on this would be helpful. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37. 

      In response to the second part of reviewer’s comment, we washed the pulldown product for 5 times each time with 1 ml IP buffer at 4ºC. We used this standard protocol for all the Co-IP experiments to detect the interaction between different septin-exocyst subunits. So, we are not sure if and how more washes or more stringent buffer conditions can interfere with detection of the interactions.

      Reviewer #3 (Recommendations for the Authors): 

      In addition to the issues noted in the public review, there were some confusing findings and references to previous literature that merit further consideration or discussion: 

      • The current gold standard for validating Alphafold predictions involves making targeted mutants suggested by the structural predictions. The absence of any such validation weakens the conclusions significantly. 

      We agree that the targeted mutagenesis based on AlphaFold2-predicted interaction interfaces represents a powerful approach to experimentally validate the in silico models. While we did not pursue structure-guided mutagenesis in this study, our goal was to identify putative interactions between septin and exocyst subunits as a foundation for future functional work. Our current conclusions are intentionally limited to proposing putative interfaces, supported by co-immunoprecipitation and genetic interaction data.

      We recognize that direct validation of specific contact residues would significantly strengthen the model. Accordingly, we have revised the Discussion to explicitly state this limitation and to note that structure-based mutagenesis will be an important next step to test the functional relevance of predicted interactions. We have added the following statement:

      “Future studies are needed to refine the residues involved in the interactions because the predicted interacting residues from AlphaFold are too numerous. However, it is encouraging that most of the predicted interacting residues are clustered in several surface patches. Experimental validation through targeted mutagenesis is an important next step.”

      • Much of the writing appears to imply that differences in mutant phenotypes indicate differences in septin (or exocyst) subunit behaviors/functions. However, my reading of the work in budding yeast is that such differences reflect the partial functionality that can be conferred by aberrant partial septin complexes that assemble and may polymerize in mutants lacking different subunits. In this view, which is supported by data showing that essentially all septins are in stoichiometric octameric complexes in cells, the wild-type functions are all mediated by the full complex. Similarly, the separate exocyst subunit localizations based on tagged Sec3 (Finger et al) were not supported by later work from the Brennwald lab with untagged Sec3, and the idea that different exocyst subunits may function separately from the full complex has very limited support in yeast. I would suggest that the text be edited to better reflect the literature, or that different views be better justified. 

      Thanks for the suggestions. We have revised the text accordingly.

      • The comprehensive set of Alphafold2 predictions is a major strength of the paper, but it is unclear to this reader whether the multiple predicted interactions truly reflect multivalent multimode interactions or whether many (most?) predictions would not be consistent with interactions between full complexes and may not indicate physiological interactions. Better discussion of these issues is needed to interpret the findings. 

      We appreciate the reviewer’s suggestion to use structural prediction to further assess interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group to examine the interfaces of septins and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      • Some but not all co-IP blots appear to show tubulin (negative control) coming down with the GFP pull-downs. Why is that, and what does it imply for the reliability of the co-IP protocol? 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      • Why were two different protocols used for different yeast-two-hybrid analyses? 

      The purpose of using two protocols was to test which protocol is more reliable and sensitive.

      • The different genetic interactions between septin and exocyst mutants when combined with TRAPP-II mutants merits further discussion: might the difference reflect relocation of exocyst from rim to center in septin mutants versus inactivation of exocyst in exocyst mutants? 

      We appreciate this insightful comment and agree that this distinction is likely meaningful. The reviewer correctly notes that septin mutants may not abolish exocyst function but rather cause its spatial mislocalization: from the rim to the center of the division site, whereas the exocyst mutants likely result in partial or complete loss of vesicle tethering activity at the plasma membrane.

      To address this important nuance, we have expanded the Discussion as follows:

      “The genetic interactions between mutations in the exocyst and septins when combined with TRAPP-II mutants may reflect fundamentally different consequences for compromising the exocyst function (Tables 1 and 2). In septin mutants, the exocyst complex still localizes to the division site but is mispositioned from the rim to the center of the division plane. This mislocalization allows partial retention of exocyst function, leading to very mild synthetic or additive defects when combined with compromised TRAPP-II trafficking and tethering. In contrast, in exocyst subunit mutants, the exocyst becomes partial or non-functional, resulting in a more severe loss of exocyst activity. These differing consequences could explain the qualitative differences in genetic interactions observed with TRAPP-II mutants (Tables 1 and 2). Thus, septins and the exocyst also work in different genetic pathways for certain functions in fission yeast cytokinesis.”

      • The vesicle accumulation in septin mutants was quite modest. Does that imply that most vesicles are still fusing in the septum? Further discussion would be beneficial to understand what the authors think this means. 

      We thank the reviewer for this important point. We agree that the modest vesicle accumulation observed in septin mutants suggests that a significant proportion of vesicles continue to successfully fuse at the division site, even in the absence of fully functional septin structures.

      We now discuss this in greater detail in the revised manuscript:

      “The relatively modest vesicle accumulation in septin mutants suggests that septins are not absolutely required for vesicle tethering or fusion per se at the division site. Instead, septins primarily function to spatially organize the targeting sites of exocyst-directed vesicles by stabilizing the localization of the exocyst at the rim of the cleavage furrow. In septin mutants, mislocalization of the exocyst reduces the spatial precision of membrane insertion but still permits vesicle tethering and fusion, albeit in a less controlled manner. Thus, septins likely play a modulatory rather than essential role in exocytic vesicle delivery during cytokinesis. This interpretation aligns with our localization and genetic interaction data, which indicates that septins act as scaffolds to optimize secretion geometry, rather than as core components of the fusion machinery.”

      • It was unclear to this reader why relocation of some exocyst complexes from the rim to the center of the septal region would lead to dramatic thickening of the septum. Further discussion would be beneficial to understand what the authors think this means. 

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Because of the lack of glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors make a bold claim that a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) causes slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) The title refers to the "precuneus-hippocampus" network. A clear definition of what is meant by this terminology is lacking. More importantly, mechanistic evidence that the precuneus and the hippocampus are involved in the potential effects of stimulation remains unconvincing.

      Thank you for the observation. We believe that the evidence collected supports our state relative to the stimulation of the precuneus and the involvement of the hippocampus. In particular, given the existing evidence on TMS methodology and precuneus non-invasive stimulation (see Koch et al., Brain, 2022, Koch et al., Alzheimer's research & therapy, 2025), the computation of the biophysical model with the E-field we produced (see Biophysical modeling and E-field calculation section in the supplementary information), together with the individual identification of the precuneus through the RM (see iTBS+γtACS neuromodulation protocol and MRI data acquisition in the main text), we can reasonably assume that the individually identified PC was stimulated.

      As we acknowledged in the Limitations section, we cannot entirely rule out the possibility that our results might also reflect stimulation of more superficial parietal regions adjacent to the precuneus. Nor do we provide direct evidence of microscopic changes in the precuneus following stimulation. However, the results we provide in terms of changes in precuneus oscillatory activity and precuneus-hippocampi connectivity sustain both our thesis of the precuneus stimulation and of hippocampi involvement in the stimulation effects.

      Despite this consideration, we agree on the fact that a clear definition of what is meant by the terminology “precuneus-hippocampus network” is lacking. Moreover, since our data and previous evidence sustain the notion of PC stimulation, while this study does not produce direct evidence of the hippocampi stimulation - but only of the effect of the neuromodulation protocol on its connection with the precuneus, we soften the claim in the title. We remove the mention of the precuneus-hippocampus network so that the modified title will be as follows: “Dual transcranial electromagnetic stimulation of the precuneus boosts human long-term memory.”

      (2) The question of the extent to which the stimulation approach and the stimulation parameters used in these experiments causes specific and functionally relevant neural effects remains open. Invasive recordings that could address this question remain out of the scope of this non-invasive study. The authors conducted scalp EEG experiments in an attempt to address this question using non-invasive methods. However, the results shown in Fig. 3 are unclear. The results are inconsistently reported in units of microvolts squared in some panels (3A, 3B) and in units of microvolts in other panels (3C). Also, there is insufficient consideration of potential contamination by signal components reflecting eye movements, other muscle artifacts, or another volume-conducted signal reflecting aggregate activity inside the brain.

      As you correctly noted, Figure 3 presents results obtained from the TMS–EEG recordings. However, there is no inconsistency regarding the measurement units, as we are referring to two distinct indices: one in the frequency domain—oscillatory power shown in Figures 3A and 3B, expressed in microvolts squared (μV<sup>²</sup>)—and one in the time domain—the TMS-evoked potential shown in Figure 3C, expressed in microvolts (μV).

      Regarding the concern about artifacts, this is an important issue on which our group has a strong expertise, having published well-established, highly cited procedures on how to record and clean TMS-EEG signals (e.g., Casula et al., Clinical Neurophysiology, 2017; Rocchi et al., Brain Stimulation, 2021). In the current study, we adopted a well-established and rigorous approach for both data acquisition and preprocessing. This ensured that the recorded TMS–EEG signals were not contaminated by physiological or electrical artifacts.

      As regards the recording procedure, all participants were instructed to fixate on a black cross to minimize eye movements. To avoid auditory-related components caused by the TMS click, we adopted an ad-hoc procedure optimized for TMS-EEG recordings (Rocchi et al., Brain Stimulation, 2021). First, participants were given earphones that continuously played an ad-hoc masking noise composed of white noise mixed with specific time-varying frequencies of the TMS click (Rocchi et al., Brain Stimulation, 2021). The masking noise volume was adjusted to ensure that participants could not detect the TMS click, or as much as tolerated (always below 90 dB). To further reduce the impact of the TMS click on the EEG signal, we placed ear defenders (SNR=30) on top of the earphones. Please see TMS–EEG data acquisition section in the main text.

      As regards the offline cleaning process, we applied Independent Component Analysis (INFOMAX-ICA) to the EEG data to identify and remove components associated with muscle activity, eye movements, blinking, and residual TMS-related artifacts, in line with the most recent guidelines on TMS–EEG preprocessing (Hernandez-Pavon et al., Brain Stimulation, 2023). Specifically, for TMS-related muscle artefacts, we strictly followed the criteria based on their scalp topography, spectral content, timing, and amplitude, which we published in a paper focused on this topic (Casula et al., Clinical Neurophysiology, 2017). We add this detail in the TMS–EEG preprocessing and analysis section in the supplementary information (lines 119-120).

      (3) Figure 3 indicates "Precuneus oscillatory activity ...", but evidence that the activity presented reflects precuneus activity is lacking. The maps shown at the bottom of Figure 3C suggest that the EEG signals recorded with scalp EEG reflect activity generated across a wide spatial range, with a peak encompassing at least tens of centimeters. Thus, evidence that effects specifically reflect precuneus activity, as the paper's title and text throughout the manuscript suggest, is lacking.

      We believe there may have been a misunderstanding. As indicated in the figure caption, panels A and B represent oscillatory activity, whereas panel C displays the TMS-evoked potentials (TEPs). Therefore, the topographical maps mentioned (i.e., those in panel C) did not refer to oscillatory activity, but to differences in TEP amplitude. Specifically, the topographies shown in Figure 3C illustrate statistically significant differences in TEP amplitudes between post-stimulation time points (T1—immediately after stimulation, and T2—20 minutes after stimulation) and the pre-stimulation baseline (T0).

      In this figure, we focused our analysis on a cluster of electrodes overlying the individually identified precuneus, capturing EEG responses to single TMS pulses delivered to that target. This approach, widely used in previous literature (e.g., Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025), supports the interpretation that the observed responses reflect precuneus-related activity. Furthermore, the wide spatial range change you mention proved to be statistically different only when conducting the TMS-EEG over the precuneus (i.e., administering the TMS single pulse over the precuneus) and not when performing it over the left parietal cortex. We modified the discussion section in the main text to make it more clear (lines 196-199).

      “Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data when performed over the precuneus and the absence of effect when TMS-EEG was performed on the lateral posterior parietal cortex used as a control condition.”

      That said, we do not state that the effects observed specifically reflect the precuneus activity; indeed, we think the effect of the stimulation is broader, as discussed in the Discussion section. We rather sustain, in line with the literature (Koch et al., Neuroimage 2018; Koch et al., Brain, 2022; Koch et al., Alzheimer's research & therapy, 2025), the idea that the effects observed are a consequence of the precuneus stimulation by the dual stimulation.

      (4) The paper as currently presented (e.g., Figure 3) also lacks rigorous evidence of relevant oscillatory activity. Prior to filtering EEG signals in a particular frequency band, clear evidence of oscillations in the frequency band of interest should be shown (e.g., demonstration of a clear peak that emerges naturally in the frequency range of interest when spectral analysis is applied to "raw" signals). The authors claim that gamma oscillations change because of the stimulation, but a clear peak in the gamma range prior to stimulation is not apparent in the data as currently presented. Thus, the extent to which spectral measurements during stimulation reflect physiological gamma oscillations remains unclear.

      If we understand correctly, your concern relates to the lack of a clear gamma peak before neuromodulation, which may suggest uncertainty about the observed changes in gamma oscillatory activity. Is that correct?

      First, it is important to underline that the natural frequency typically observed in the precuneus falls within the beta range, not the gamma range (see Rosanova et al., Journal of Neuroscience, 2009; Casula et al., Annals of Neurology, 2022). This explains why a prominent gamma peak is not expected at baseline (T0).

      Differently, our neuromodulatory protocol was specifically aimed at boosting gamma oscillatory activity given its well-established role in learning and memory processes (Griffiths & Jensen, Trends in Neurosciences, 2023). Thus, to assess the effect of the neuromodulatory protocol, we compared the oscillatory activity before (T0) and after stimulation (T1 and T2), which showed a clear increase in the gamma band. This effect is visible in the raw oscillatory power plot and is most clearly represented in Figure 3B, where the gamma band emerged as the only frequency range showing significant changes across time points.

      (5) Concerns remain regarding the rigor of statistical analyses in the revised manuscript (see also point 8 below). Figure 3B shows an undefined statistical test with p<0.05. The statistical test that was used is not explained. Also, a description of how corrections for multiple comparisons were made is missing. Figures 3A and 3C are not accompanied by statistics, making the results difficult to interpret. For Figure 4C, a claim was made based on a significant p-value for one statistical test and a non-significant p-value in another test. This is a common statistical mistake (see Figure 1 and accompanying discussion in Makin and Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175).

      All statistical tests are described in the Statistical Analysis section of the main text. Specifically, to assess cortical oscillation changes in Experiment 3, we conducted repeated-measures ANOVAs with stimulation condition (iTBS+γtACS vs. iTBS+sham-tACS) and time (ΔT1 = T1–T0; ΔT2 = T2–T0) as within-subject factors, for each frequency band. To further explore the effects of stimulation at each time point, we performed paired t-tests with Bonferroni correction for multiple comparisons. A one-tailed hypothesis was adopted, based on our a priori prediction of gamma-band increase derived from previous work (Maiella et al., 2022).

      Please note that Figures 3A and 3C are purely descriptive and are therefore not accompanied by statistical tests. Figure 3A shows the full spectral profile across frequencies and conditions, while statistical significance for these data is reported in Figure 3B. Similarly, the upper part of Figure 3C displays the TMS-evoked potential (TEP) in the precuneus, while the statistical comparison of TEP amplitudes across time points is shown in the lower part of Figure 3C.

      Regarding Figure 4C and the article you cited, are you referring to the error described as “Interpreting comparisons between two effects without directly comparing them”? If we understand correctly, this refers to the mistake of inferring an effect by observing that a significant result occurs in one condition or group, while the corresponding result in another condition or group is not significant, without directly testing the difference between them.

      In the case of Experiment 4, which investigates fMRI effects and is illustrated in Figure 4, we employed a general linear model that explicitly modeled both conditions and time points, allowing for a direct statistical comparison. Therefore, the connectivity effect reported does not fall into the category of the error you mentioned.

      Importantly, Figure 4C does not depict the effect of the neuromodulatory protocol itself. Rather, its purpose is to show that, within the real stimulation condition, there is a correlation between the observed effect and the integrity of the bilateral Middle Longitudinal Fasciculus. No conclusions or assumptions were made based on the absence of a significant correlation in the sham condition. However, since it was an exploratory analysis, we decided to soften our claims relative to the neural mechanism in the discussion section of the main text (lines 241-246).

      (6) In the second question posed in the original review, I highlighted that it was unclear how such stimulation would produce memory enhancement. The authors replied that, in the absence of mechanisms, there are many other studies that suffer from the same problem. This raises the question of placebo effects. The paper does not sufficiently address or discuss the possibility that any potential stimulation effects may reflect placebo effects.

      We agree with the reviewer on the potential role of a placebo effect in our study. For this reason, our experimental study had several stimulation conditions, including a placebo condition, which corresponded to the sham iTBS-sham tACS condition, which did not produce any effect.

      (7) The third major concern in the original review was the lack of evidence for a mechanism that is specific to the precuneus. Evidence for specific involvement of the precuneus remains lacking in the revised manuscript. The authors state: "the non-invasive stimulation protocol was applied to an individually identified precuneus for each participant". However, the meaning of this statement is unclear. Specifically, it is unclear how the authors know that they are specifically targeting the precuneus. Without directly recording from the precuneus and directly demonstrating effects, which is outside of the scope of the study, specific involvement of the precuneus seems speculative. Also, it does not seem as though a figure was included in the paper to show how the stimulation protocol specifically targets the precuneus. In their response to the original reviews, the authors state that posterior medial parietal areas are the only regions that show significant differences following the stimulation, but they did not cite a specific figure, or statistics reported in the text, that show this. In any event, posterior medial parietal areas encompass a wide area of the brain, so this would still not provide evidence for an effect specifically involving the precuneus.

      We respectfully disagree with the claim that targeting the precuneus in our study is speculative. The statement that “without directly recording from the precuneus and directly demonstrating effects, which is outside the scope of the study, specific involvement of the precuneus seems speculative” would, by that logic, implicitly call into question a large body of cognitive neuroscience research employing non-invasive techniques such as EEG and fMRI.

      Our methodological approach—combining MRI-guided stimulation, biophysical modeling, and TMS–EEG—is well established and widely used for targeting and studying the role of specific cortical regions, including the precuneus (e.g., Wang et al., Science, 2014; Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022, 2023; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025).

      In line with previously published protocols (Santarnecchi et al., Human Brain Mapping, 2018; Özdemir et al., PNAS, 2020; Mantovani et al., Journal of Psychiatric Research, 2021), we identified individual targets (i.e., the precuneus) for each participant based on structural and resting-state functional MRI data (see MRI Data Acquisition and Preprocessing section in the main text). This target was then accurately localized using MRI-guided stereotaxic neuronavigation, ensuring reproducible and anatomically precise stimulation across subjects.

      Finally, concerning the last comment about the lack of figures/statistics showing how the stimulation protocol targets the precuneus and the specificity of the effect observed, we would like to let the focus go over:

      Figure 3 in the main text, where we show the results of the TME-EEG over the posterior medial parietal areas;

      Figure S1 in the supplementary information, which shows with the e-fied simulation how the stimulation protocol targets the brain;

      the Precuneus iTBS+γtACS increases gamma oscillatory activity section in the main text results, where we report the results of the statistical analysis of the TMS-EEG conducted over the precuneus and the left posterior parietal cortex, used as a control condition to test for the specificity of the neuromodulation protocol.

      (8) Regarding chance levels, it is unfortunate that the authors cannot quantify what chance levels are in the immediate and delayed recall conditions. This makes interpretation of the results challenging. In the immediate and delayed conditions, the authors state that the chance level is 33%. It would be useful to mark this in the figures. If I understand correctly, chance is 33% in Fig. 2A. If this is the case and if I am interpreting the figure correctly:

      Gray bars for the sham condition appear to be below chance (~20-25%). Why is this condition associated with an accuracy level that is lower than chance?

      Cyan bars and red bars do not appear to be significantly different from chance (i.e., 33%), with red slightly higher than cyan. What statistic was performed to obtain the level of significance indicated in the figure? The highest average value for the red condition appears to be around 35%. More details are needed to fully explain this figure and to support the claims associated with this figure.

      The immediate and recall conditions you mention correspond to a free recall task. In this case, the notion of a fixed "chance level" is not straightforward as it would be in recognition or forced-choice paradigms, which is why we did not quantify it at first. I will now try to explain this extensively.

      Unlike multiple-choice tasks, where participants select the answer from a limited set of alternatives and the probability of a correct response by chance can be precisely quantified (e.g., 33% in a 3-alternative forced choice), free recall involves the spontaneous retrieval of items from memory without external cues or predefined options. As such, the response range in free recall is essentially unconstrained, encompassing the entire vocabulary of the participant.

      Because of this open-ended nature, the probability of correctly recalling a studied item purely by chance is exceedingly low and could be approximated to zero. Also, in our task, participants had to correctly recollect both name and occupation, doubling the possibility of the answers.

      This assumption is further supported by the fact that random guesses in free recall are unlikely to match any of the studied items, given the vast number of possible alternatives. As a result, performance above zero can be reasonably interpreted as reflecting genuine memory retrieval, rather than random guessing.

      As regards statistics, repeated-measures ANOVAs with stimulation condition as a within-subject factor (i.e., iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS) for each dependent variable (see statistical analysis section in main text).

      (9) In the revised version of the paper, the authors did not address concerns associated with the block design (please see question 4d in the original review).

      We are sorry for the misunderstanding. We did not address your concerns related to block design since it does not apply to our study. As reported in the paper you mentioned in the original review, block design involves data collection performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design since both TMS-EEG and fMRI were conducted in the resting state (i.e., without the presentation of stimuli) on different days according to the different randomized stimulation conditions.  

      In sum, this study presents an admirable aspirational goal, the notion that a non-invasive stimulation protocol could modulate activity in specific brain regions to enhance memory. However, the evidence presented at the behavioral level and at the mechanistic level (e.g. the putative involvement of specific brain regions) remains unconvincing.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Comments on Revision:

      I thank the authors for their thoughtful responses to my first review and their inclusion of more detailed methodological discussion of their rationale for the stimulation protocol conditions and timing. Regarding the apparent difference in connectivity at baseline between conditions, the explanation that this is due to intrinsic dynamics, state, or noise implies the baseline is reflecting transient changes in dynamics rather than a true or stable baseline. Based on this, it looks like iTBS solely is significantly greater than the baseline before the iTBS and γtACS condition but maybe not that much lower than post-stimulation period for iTBS and γtACS. A longer baseline period should be used to ensure transient states are not driving baseline levels such that these endogenous fluctuations would average out. This also raises questions about whether the effect of iTBS and γtACS or iTBS alone are dependent on the intrinsic state at the time when stimulation begins. Their additional clarification of memory scoring is helpful but also reveals that the effect of dual iTBS+γtACS specifically on the association between faces and names is just significant. This modest increase in associative memory should be taken into consideration when interpreting these findings.

      We thank the reviewer for the feedback. We fully agree that considering baseline dynamics is critical when assessing the neurophysiological and connectivity effects of stimulation protocols.

      In Experiments 3 and 4, baseline measurements were specifically included in our design to account for the possibility that intrinsic dynamics, state, or noise could influence the observed effects of neuromodulation. Indeed, if we had compared only post-stimulation connectivity between the real and sham conditions, the effects might have appeared larger. The inclusion of baseline measurements allows us to contextualize and better isolate the neuromodulatory impact by controlling such endogenous fluctuations. Importantly, the fMRI connectivity measurements, which comprise the baseline, are derived from 10-minute BOLD signal acquisitions, which help mitigate the influence of transient fluctuations and provide a quite stable estimate of intrinsic connectivity.

      Moreover, regarding the possibility that stimulation effects may depend on the intrinsic state at stimulation onset, we hypothesize that gamma-frequency entrainment induced by tACS could reduce the variability of intrinsic dynamics, promoting a more stable neural state that is favorable for the induction of long-term plasticity.

      As regards the memory scoring, we would like to clarify that the significant improvement observed in the dual iTBS+γtACS condition does not pertain solely to the face–name association. Rather, it concerns the more demanding task of recalling the association between face, name, and occupation. While we agree that the observed effect could be considered modest, it is worth noting that it follows from only 3 minutes of stimulation.

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they find that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increases gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments. It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. In the revised manuscript, the authors provide post-hoc sensitivity analyses that help contextualize the strength of the findings.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in separate experiments, but readers should keep in mind that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      We thank the reviewer for the feedback.

      Reviewer #1 (Recommendations for the authors):

      I suggest:

      (1) Removing all mechanistic claims about the precuneus and hippocampus.

      We soften our claims about the precuneus-hippocampus network.

      (2) Repeating and focusing on the behavioral experiments with a much larger number of images and stronger statistical power to try to demonstrate a compelling behavioral correlate of the proposed stimulation protocol.

      We clarified the misunderstanding relative to the chance level of the behavioral experiments raised by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Use longer baseline to establish stable gamma level for comparisons in Figure 3

      If we understand correctly, you propose to increase the baseline to establish the gamma oscillatory activity as expressed in Figure 3 (showing the results of experiment 3). Is that right? In the figure, you see a baseline of -100; 0ms, which we use for a merely graphical reason, since no activity is usually observable before the TMS pulse. However, to establish the level of gamma, we used a larger baseline correction ranging from -700 ms to -300 ms (i.e., 400ms). We added this important information in the cortical oscillation section of the supplementary information (lines 134-135).

      Reviewer #3 (Recommendations for the authors):

      I think that the authors did a great job responding to the concerns raised by the reviewers. All of my own comments have been satisfactorily addressed. I will update my public review to be more concise, so that it only includes the overall assessment of the manuscript, including the strengths and weaknesses, but without the requests for clarification. Strengths and weaknesses remain largely the same, as the authors did not conduct additional experiments.

      Thank you.

    1. Author response:

      Reviewing Editor Comments:

      The following are some consolidated review remarks after discussions amongst all three reviewers:

      The reviewers feel the evidence level could be raised from 'convincing' to 'compelling' if the following key (and partially shared) suggestions by the reviewers are followed adequately:

      (1) Expand labeling options for nMAGIC, which is currently just a BFP marker. This would increase the utility of the method. A far-red marker would be very helpful. Could the authors just do this for one chromosome arm and make the reagent available for others to generate other chromosome arms?

      This is a great suggestion. We will make an nMAGIC vector containing a far-red fluorescent marker and generate a 40D2 version of this nMAGIC gRNA-maker to demonstrate its utility. This vector will be available for others to make additional nMAGIC gRNA-markers.

      (2) Verify that destabilized GAL80 is potent enough to suppress GAL4. Repeat Figure 1C-E with tub-GAL80-DE-SV40.

      We will use a tub-GAL80-DE-SV40 gRNA-marker to test suppression of pxn-Gal4.

      (3) Concern about the health of the induced mitotic clones. This is an important consideration, but the reviewers were not sure what the necessary experiments would be. To gauge twin-spot clone sizes? Please address.

      We will assess the health of induced mitotic clones in wing imaginal discs. We will do this by generating twin spots with a nMAGIC gRNA-marker in wing discs and compare the sizes of the two cell populations (BFP<sup>+/+</sup> and BFP<sup>-/-</sup>) in twin spots.

      (4) Include a schematic of the MAGIC method as Figure 1 or add it to Figure 1. Many may not be familiar with the method, so to promote its adoption, the authors should clearly introduce the MAGIC method in this paper (and not rely on readers to go to previous publications). For this paper to become a MAGIC reference paper, it should be self-contained.

      We will add a diagram of the MAGIC method in the revised manuscript.

      (5) Determine the utility of using a hs-Cas9 line for temporal induction of MAGIC clones. This is a traditional method for mitotic clone induction (with hsFLP/FRTs), and its use with the MAGIC system (especially pMAGIC) could also make it more attractive, especially to label small populations of neurons born at known times. To this point, the authors could generate pMAGIC clones using hs-Cas9 for commonly used adult target neurons, such as projection neurons, central complex neurons, or mushroom body neurons. The method to label small numbers of these adult neurons is well worked out with known GAL4 lines, and demonstrating that pMAGIC could have similar results would capture the attention of many not familiar with the pMAGIC method.

      We thank the reviewers for this suggestion. We will test hs-Cas9 in inducing pMAGIC clones in one of the neuronal populations in the adult brain, as suggested by the reviewers.

      In addition, we will address all other minor concerns of the reviewers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Plasmodium vivax can persist in the liver of infected individuals in the form of dormant hypnozoites, which cause malaria relapses and are resistant to most current antimalarial drugs. This highlights the need to develop new drugs active against hypnozoites that could be used for radical cure. Here, the authors capitalize on an in vitro culture system based on primary human hepatocytes infected with P. vivax sporozoites to screen libraries of repurposed molecules and compounds acting on epigenetic pathways. They identified a number of hits, including hydrazinophthalazine analogs. They propose that some of these compounds may act on epigenetic pathways potentially involved in parasite quiescence. To provide some support to this hypothesis, they document DNA methylation of parasite DNA based on 5-methylcytosine immunostaining, mass spectrometry, and bisulfite sequencing.

      Strengths:

      -The drug screen itself represents a huge amount of work and, given the complexity of the experimental model, is a tour de force.

      -The screening was performed in two different laboratories, with a third laboratory being involved in the confirmation of some of the hits, providing strong support that the results were reproducible.

      -The screening of repurposing libraries is highly relevant to accelerate the development of new radical cure strategies.

      We thank the reviewer for pointing out the strengths of our report.

      Weaknesses:

      The manuscript is composed of two main parts, the drug screening itself and the description of DNA methylation in Plasmodium pre-erythrocytic stages. Unfortunately, these two parts are loosely connected. First, there is no evidence that the identified hits kill hypnozoites via epigenetic mechanisms. The hit compounds almost all act on schizonts in addition to hypnozoites, therefore it is unlikely that they target quiescence-specific pathways. At least one compound, colforsin, seems to selectively act on hypnozoites, but this observation still requires confirmation. Second, while the description of DNA methylation is per se interesting, its role in quiescence is not directly addressed here. Again, this is clearly not a specific feature of hypnozoites as it is also observed in P. vivax and P. cynomolgi hepatic schizonts and in P. falciparum blood stages. Therefore, the link between DNA methylation and hypnozoite formation is unclear. In addition, DNA methylation in sporozoites may not reflect epigenetic regulation occurring in the subsequent liver stages.

      We agree our report lacks direct evidence that hydrazinophthalazines are interacting with parasite epigenetic mechanisms. We spent significant resources attempting several novel approaches to establish a direct connection, but technological advances are needed to enable such studies, which we mention in the introduction and discussion. We disagree that schizonticidal activity automatically excludes the possibility a hypnozonticidal hit is acting on quiescence-specific pathways because both hypnozoites and schizonts are under epigenetic control and these pathways are likely performing different functions in different stages. Also important is the use of the word ‘specific’ as this term could be used to indicate parasite versus host (a drug that clears a parasite infection with a safety margin), parasite-directed effect versus host-directed effect (a drug acting via an agonistic or antagonistic effect on parasite or host pathway(s), but leading to parasite death in either case), hypnozoite versus schizont, or P. vivax versus other Plasmodium species. We were careful to indicate the usage of ‘specific’ throughout the text. Given the almost-nonexistent hit rate when screening diverse small molecule libraries screening against P. vivax hypnozoites, and remarkable increase in hits when screening epigenetic inhibitors as described in this report, our data suggests epigenetic pathways are important to the regulation of hypnozoite dormancy in addition to regulation of other parasite stages, but those effects are outside the scope of this report.

      -The mode of action of the hit compounds remains unknown. In particular, it is not clear whether the drugs act on the parasite or on the host cell. Merely counting host cell nuclei to evaluate the toxicity of the compounds is probably acceptable for the screen but may not be sufficient to rule out an effect on the host cell. A more thorough characterization of the toxicity of the selected hit compounds is required.

      We agree, and mention in the results and discussion, that the effect could be mediated through host pathways. This is not unlike the 8-aminoquinolones, which are activated by host cytochromes and kill via ROS, which is a nonspecific mechanism (that is, the compound is not directly interacting with a parasite target) leading to a parasite-specific effect (the parasite cannot tolerate the ROS produced, but the host can). During screening, it is generally the case that detecting hits with direct effects on the target organism are more desirable, so hits are counterscreened for general cytotoxicity. In this report, we show an effect on the parasite in direct comparison to the effect on host primary hepatocytes in the P. vivax assay itself, and follow up on hits with general counterscreens using two mammalian cell lines using CellTiter Glo, which does not rely on nuclei counts. Some compounds did show general cytotoxic effects, but with selectivity (more potency) against P. vivax liver stages, while other hits like the hydrazinophthalazines did not show an effect against primary hepatocytes and show only weak toxicity against mammalian cells at the highest dose tested. Further studies are needed to determine if the effect is indeed host- or parasite-directed and, if hydrazinophthalazines are to be developed into marketed antimalarials, extensive safety testing would be part of the development process.

      -There is no convincing explanation for the differences observed between P. vivax and P. cynomolgi. The authors question the relevance of the simian model but the discrepancy could also be due to the P. vivax in vitro platform they used.

      Fully characterizing the chemo-sensitivity of P. vivax and P. cynomolgi liver stages is outside the scope of this report. Rather, we report tool compounds which could be used in future studies to further characterize these sister species. We also make the point that P. cynomolgi is the gold standard for in vivo antirelapse activity, but it is still a model species, not a target species, and so few experimental hypnozonticidal compounds have been reported that the predictive value of P. cynomolgi is not fully understood. We found that several of our hits were species-specific using our in vitro platforms, thus future studies are needed to ensure this predictive value.

      -Many experiments were performed only once, not only during the screen (where most compounds were apparently tested in a single well) but also in other experiments. The quality of the data would be increased with more replication.

      Due to their size, compound library screens are typically performed once, with confirmation in dose-response assays, which were repeated several times. Rhesus PK studies was performed once on three animals, which is typical. All other studies were performed at least twice and most were performed three times or more. We provide a data table showing readers the source material for all replication as well as other source data tables showing the raw data for dose-response and other assays.

      -While the extended assay (12 days versus 8 days) represents an improvement of the screen, the relevance of adding inhibitors of core cytochrome activity is less clear, as under these conditions the culture system deviates from physiological conditions.

      We agree that cytochrome inhibitors render the platform less physiologically relevant, but the goal of screening is to detect hits which could be improved upon using medicinal chemistry, including metabolic stability. Metabolic stability is better assessed using standard assays such as liver microsomes, thus our goal was to characterize the effects of test compounds on the parasite without the confounding effect of hepatic metabolism.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, inhibitors of the P. vivax liver stages are identified from the Repurposing, Focused Rescue, and Accelerated Medchem (ReFRAME) library as well as a 773-member collection of epigenetic inhibitors. This study led to the discovery that epigenetics pathway inhibitors are selectively active against P. vivax and P. cynomolgi hypnozoites. Several inhibitors of histone post-translational modifications were found among the hits and genomic DNA methylation mapping revealed the modification on most genes. Experiments were completed to show that the level of methylation upstream of the gene (promoter or first exon) may impact gene expression. With the limited number of small molecules that act against hypnozoites, this work is critically important for future drug leads. Additionally, the authors gleaned biological insights from their molecules to advance the current understanding of essential molecular processes during this elusive parasite stage.

      Strengths:

      -This is a tremendously impactful study that assesses molecules for the ability to inhibit Plasmodium hypnozoites. The comparison of various species is especially relevant for probing biological processes and advancing drug leads.

      -The SI is wonderfully organized and includes relevant data/details. These results will inspire numerous studies beyond the current work.

      We thank the reviewer for pointing out the strengths of our report.

      Reviewer #3 (Public Review):

      Although this work represents a massive screening effort to find new drugs targeting P. vivax hypnozoites, the authors should balance their statement that they identified targetable epigenetic pathways in hypnozoites.

      -They should emphasize the potential role of the host cell in the presentation of the results and the discussion, as it is known that other pathogens modify the epigenome of the host cell (i.e. toxoplasma, HIV) to prevent cell division. Also, hydrazinophtalazines target multiple pathways (notably modulation of calcium flux) and have been shown to inhibit DNA-methyl transferase 1 which is lacking in Plasmodium.

      -In a drug repurposing approach, the parasite target might also be different than the human target.

      -The authors state that host-cell apoptotic pathways are downregulated in P. vivax infected cells (p. 5 line 162). Maybe the HDAC inhibitors and DNA-methyltransferase inhibitors are reactivating these pathways, leading to parasite death, rather than targeting parasites directly.

      We agree caution must be taken as we did not directly confirm the mechanism of our hits. Many follow up studies will be needed to do so. We do point out in the discussion that the mechanism of hits could be host-directed. We agree with the notion that some of these hits could be affecting parasitized host cell pathways, which lead to death of the parasitized cell, with the parasite being collateral damage, yet such a mechanism could lead to a safe and effective novel antimalarial.

      It would make the interpretation of the results easier if the authors used EC50 in µM rather than pEC50 in tables and main text. It is easy to calculate when it is a single-digit number but more complicated with multiple digits.

      We apologize for the atypical presentation of potency data. However, there is growing concern in drug discovery when Standard Deviation is applied to Potency data because Standard Deviation is a linear calculation and Potency is a log effect, making the math incompatible. We understand thousands of papers are reported every year using this mathematically incorrect method, making our presentation of these data less familiar. However, we define pEC50 in its use in the text and table legends and hope to increase its use in the broader scientific community.

      Authors mention hypnozoite-specific effects but in most cases, compounds are as potent on hypnozoite and schizonts. They should rather use "liver stage specific" to refer to increased activity against hypnozoites and schizonts compared to the host cell. The same comment applies to line 351 when referring to MMV019721. Following the same idea, it is a bit far-fetched to call MMV019721 "specific" when the highest concentration tested for cytotoxicity is less than twice the EC50 obtained against hypnozoites and schizonts.

      We have reviewed and revised statements in the manuscript to ensure the effect we are describing is accurate in terms of parasite versus parasite form.

      Page 5 lines 187-189, the authors state "...hydrazinophtalazines were inactive when tested against P. berghei liver schizonts and P. falciparum asexual blood stages, suggesting that hypnozoite quiescence may be biologically distinct from developing schizonts". The data provided in Figure 1B show that these hydrazinophtalazines are as potent in P. vivax schizonts than in P. vivax hypnozoites, so the distinct activity seems to be Plasmodium species specific and/or host-cell specific (primary human hepatocytes rather than cell lines for P. berghei) rather than hypnozoite vs schizont specific.

      We agree the effect of hydrazinophtalazine could be more species specific than stage specific, but the context of our comment has to do with current methods in antimalarial discovery and development. Given the biological uniqueness of the various Plasmodium species and stages, any hypnozonticidal hit may or may not have pan-species or pan-stage activity; our goal was to characterize this. Regardless of the mechanism, we found it interesting that the hydrazinophtalazines kill P. vivax hypnozoites, but not P. cynomolgi hypnozoites nor other species and stages used in antimalarial drug development. This result makes the point that hypnozoite-focused assays may be required to detect and develop hypnozonticidal hits, regardless of what other species or stages they may or may not act on.

      Why choose to focus on cadralazine if abandoned due to side effects? Also, why test the pharmacokinetics in monkeys? As it was a marketed drug, were no data available in humans?

      Cadralazine was found more potent than hydralazine and PK data was available from humans, thus dose prediction calculations showed an efficacious dose was more achievable with cadralazine than hydralazine. Side effects are often dependent on dose and regimen, which are very likely to be much different for treating malaria versus hypertension. Thus, the potential side effects of cadralazine if it was to be used as an antimalarial are simply unknown and are not disqualifying at this step. The PK study was done in Rhesus macaques so we could calculate the dose needed to achieve coverage of EC90 during a planned follow up in a Rhesus-P. cynomolgi relapse model. However, this planned in vivo efficacy study was not justified once we concurrently discovered cadralazine was inactive on P. cynomolgi in vitro.

      In the counterscreen mentioned on page 6, the authors should mention that the activity of poziotinib in P. berghei and P. cynomolgi is equivalent to cell toxicity, so likely not due to parasite specificity.

      Poziotinib shows activity against mammalian cell lines but not against the primary hepatocyte cultures supporting dose-response assays against P. vivax liver forms, which do not replicate. Thus, poziotinib appears selective in the liver stage assay but also may have a much more potent effect in continuously replicating cell lines.

      To improve the clarity and flow of the manuscript, could the authors make a recapitulative table/figure for all the data obtained for poziotinib and hydrazinophtalazines in the different assays (8-days vs 12-days) and laboratory settings rather than separate tables in main and supplementary figures. Maybe also reorder the results section notably moving the 12-day assay before the DNA methylation part.

      We apologize for the large amount of data presented but believe we are presenting it in the clearest way possible. All raw data is available if readers wish to re-analyze or re-organize our findings.

      The isobologram plot shows an additive effect rather than a synergistic effect between cadralazine and 5-azacytidine, please modify the paragraph title accordingly. Please put the same axis scale for both fractional EC50 in the isobologram graph (Figure 2A).

      The isobologram shows the effect approaching synergy at some combinations. The isobologram was rendered using standard methods. The raw data is available if readers wish to re-analyze it.

      Concerning the immunofluorescence detection of 5mC and 5hmC, the authors should be careful with their conclusions. The Hoechst signal of the parasites is indistinguishable because of the high signal given by the hepatocyte nuclei. The signal obtained with the anti-5hmC in hepatocyte nuclei is higher than with the anti-5mC, thus if a low signal is obtained in hypnozoites and schizonts, it might be difficult to dissociate from the background. In blood stages (Figure S18), the best to obtain a good signal is to lyse the red blood cell using saponin, before fixation and HCl treatment.

      We spent many hours using high resolution imaging of hundreds of parasites trying to detect clear 5hmC signal in both hypnozoites and schizonts but never saw a clearly positive signal. Indeed, the host signal can be confounding, thus we felt the most clear and unbiased way to quantify and present these data was using HCI. We appreciate the suggestion to lyse cells first for detecting in the blood stage.

      To conclude that 5mC marks are the predominate DNA methylation mark in both P. falciparum and P. vivax, authors should also mention that they compare different stages of the life cycle, that might have different methylation levels.

      We do mention at the start of this section our reasoning that quantifying marks in sporozoites was technically achievable, but not in a mixed culture of parasites and hepatocytes. We agree they could have different marks at these different stages.

      Also, the authors conclude that "[...] 5mC is present at low level in P. vivax and P. cynomolgi sporozoites and could control liver stage development and hypnozoite quiescence". Based on the data shown here, nothing, except presence the of 5mC marks, supports that DNA methylation could be implicated in liver stage development or hypnozoite quiescence.

      We clearly show sporozoite and liver stage DNA is methylated, which implicates this fundamental cell function exists in P. vivax liver stages, and that compounds with characterized activity against DNMT are active on liver stages. We acknowledge we were unable to show a direct effect and use the qualifier ‘could’ for this very reason.

      How many DNA-methyltransferase inhibitors were present in the epigenetic library? Out of those, none were identified as hits, maybe the hydrazinophtalazines effect is not linked to DNMT inhibition but another target pathway of these molecules like calcium transport?

      We supply the complete list of inhibitors in the epigenetic library as a supplemental file, the library contained 773 compounds. Hydrazinophtalazines were not included in the library, but several other DNA methyltransferase inhibitors were inactive. It is possible that hydrazinophtalazine activity is linked to other mechanisms but the inactivity of other DNMT inhibitors does not preclude the possibility hydrazinophtalazines are acting through DNMT.

      The authors state (line 344): "These results corroborate our hypothesis that epigenetic pathways regulate hypnozoites". This conclusion should be changed to "[...] that epigenetic pathways are involved in P. vivax liver stage survival" because:

      -The epigenetic inhibitors described here are as active on hypnozoite than liver schizonts.

      -Again, we cannot rule out that the host cell plays a role in this effect and that the compound may not act directly on the parasite.

      The same comment applies to the quote in lines 394 to 396. There is no proof in the results presented here that DNA methylation plays any role in the effect of hydrazinophtalazines in the anti-plasmodial activity obtained in the assay.

      We maintain that we use words throughout the text that express uncertainty about the mechanisms involved. It is important to point out that, prior to this paper, the number of hypnozonticidal hits was incredibly low and this field is just emerging. The fundamental role of epigenetic mechanisms is regulation of gene expression. Finding several hypnozonticial hits when screening epigenetic libraries implies epigenetic pathways are important for hypnozoite survival. We intentionally do not specify exact mechanisms or if they are host or parasite pathways. Host-parasite interactions in the liver stage are incredibly difficult to resolve and are outside the scope of this report. Furthermore, this statement is not exclusive to schizonts, but since screens of diversity sets against schizonts result in a much higher hit rate, the focus of this comment is unearthing rare hypnozonticidal hits.

    1. Author response:

      We thank the editors and reviewers for the time and effort they have invested in evaluating our manuscript. We appreciate the constructive feedback, which highlights both the strengths of the work and areas for improvement. We will carefully consider all comments and, in the coming months, revise the manuscript to incorporate additional data, address the concerns regarding limited referencing, and provide further clarification on the points raised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the cellular mechanisms underlying place field formation (PFF) in hippocampal CA1 pyramidal cells by performing in vivo two-photon calcium imaging in head-restrained mice navigating a virtual environment. Specifically, they sought to determine whether BTSP-like (behavioral time scale synaptic plasticity) events, characterized by large calcium transients, are the primary mechanism driving PFFs or if other mechanisms also play a significant role. Through their extensive imaging dataset, the authors found that while BTSP-like events are prevalent, a substantial fraction of new place fields are formed via non-BTSP-like mechanisms. They further observed that large calcium transients, often associated with BTSP-like events, are not sufficient to induce new place fields, indicating the presence of additional regulatory factors (possibly local dendritic spikes).

      Strengths

      The study makes use of a robust and extensive dataset collected from 163 imaging sessions across 45 mice, providing a comprehensive examination of CA1 place-cell activity during navigation in both familiar and novel virtual environments. The use of two-photon calcium imaging allows the authors to observe the detailed dynamics of neuronal activity and calcium transients, offering insights into the differences between BTSP-like and non-BTSP-like PFF events. The study's ability to distinguish between these two mechanisms and analyze their prevalence under different conditions is a key strength, as it provides a nuanced understanding of how place fields are formed and maintained. The paper supports the idea that BTSP is not the only driving force behind PFF, and other mechanisms are likely sufficient to drive PFF, and BTSP events may also be insufficient to drive PFF in some cases. The longer-than-usual virtual track used in the experiment allowed place cells to express multiple place fields, adding a valuable dimension to the dataset that is typically lacking in similar studies. Additionally, the authors took a conservative approach in classifying PFF events, ensuring that their findings were not confounded by noise or ambiguous activity.

      Weaknesses

      Despite the impressive dataset, there are several methodological and interpretational concerns that limit the impact of the findings. Firstly, the virtual environment appears to be poorly enriched, relying mainly on wall patterns for visual cues, which raises questions about the generalizability of the results to more enriched environments. Prior studies have shown that environmental enrichment can significantly influence spatial coding, and it would be important to determine how a more immersive VR environment might alter the observed PFF dynamics. Secondly, the study relies on deconvolution methods in some cases to infer spiking activity from calcium signals without in vivo ground truth validation. This introduces potential inaccuracies, as deconvolution is an estimate rather than a direct measure of spiking, and any conclusions drawn from these inferred signals should be interpreted with caution. Thirdly, the figures would benefit from clearer statistical annotations and visual enhancements. For example, several plots lack indicators of statistical significance, making it difficult for readers to assess the robustness of the findings. Furthermore, the use of bar plots without displaying underlying data distributions obscures variability, which could be better visualized with violin plots or individual data points. The manuscript would also benefit from a more explicit breakdown of the proportion of place fields categorized as BTSP-like versus non-BTSP-like, along with clearer references to figures throughout the results section. Lastly, the authors' interpretation of their data, particularly regarding the sufficiency of large calcium transients for PFF induction, needs to be more cautious. Without direct confirmation that these transients correspond to actual BTSP events (including associated complex spikes and calcium plateau potentials), concluding that BTSP is not necessary or sufficient for PFF formation is speculative.

      Reviewer #2 (Public review):

      Summary:

      The authors of this manuscript aim to investigate the formation of place fields (PFs) in hippocampal CA1 pyramidal cells. They focus on the role of behavioral time scale synaptic plasticity (BTSP), a mechanism proposed to be crucial for the formation of new PFs. Using in vivo two-photon calcium imaging in head-restrained mice navigating virtual environments, employing a classification method based on calcium activity to categorize the formation of place cells' place fields into BTSP, non-BTSP-like, and investigated their properties.

      Strengths:

      A new method to use calcium imaging to separate BTSP and non-BTSP place field formation. This work offers new methods and factual evidence for other researchers in the field.

      The method enabled the authors to reveal that while many PFs are formed by BTSP-like events, a significant number of PFs emerge with calcium dynamics that do not match BTSP characteristics, suggesting a diversity of mechanisms underlying PF formation. The characteristics of place fields under the first two categories are comprehensively described, including aspects such as formation timing, quantity, and width.

      Weaknesses:

      There are some issues about data and statistics that need to be addressed before these research findings can be considered as rigorous conclusions.

      While the authors mentioned 3 features of PF generated by BTSP during calcium imaging in the Introduction, the classification method used features 1 and 2. The confirmation by feature 3 in its current form is important but not strong enough.

      Some key data is missing such as the excluded PFs, the BTSP/non-BTSP of each animal, etc

      Impact:

      This work is likely to provide a new method to classify BTSP and non-BTSP place field formation using calsium image to the field.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Sumegi et al. use calcium imaging in head-fixed mice to test whether new place fields tend to emerge due to events that resemble behavioral time scale plasticity (BTSP) or other mechanisms. An impressive dataset was amassed (163 sessions from 45 mice with 500-1000 neurons per sample) to study the spontaneous emergence of new place fields in area CA1 that had the signature of BTSP. The authors observed that place fields could emerge due to BTSP and non-BTSP-like mechanisms. Interestingly, when non-BTSP mechanisms seemed to generate a place field, this tended to occur on a trial with a spontaneous reset in neural coding (a remapping event). Novelty seemed to upregulate non-BTSP events relative to BTSP events. Finally, large calcium transients (presumed plateau potentials) were not sufficient to generate a place field.

      Strengths:

      I found this manuscript to be exceptionally well-written, well-powered, and timely given the outstanding debate and confusion surrounding whether all place fields must arise from BTSP event. Working at the same institute, Albert Lee (e.g. Epszstein et al., 2011 - which should be cited) and Jeff Magee (e.g. Bittner et al., 2017) showed contradictory results for how place fields arise. These accounts have not fully been put toe-to-toe and reconciled in the literature. This manuscript addresses this gap and shows that both accounts are correct - place fields can emerge due to a pre-existing map and due to BTSP.

      We thank the Reviewer for his/her appreciation of the importance of our study. We have included the additional reference.

      Weaknesses:

      I find only three significant areas for improvement in the present study:

      First, can it be concluded that non-BTSP events occur exclusively due to a global remapping event, as stated in the manuscript "these PFF surges included a high fraction of both non-BTSP- and BTSP-like PFF events, and were associated with global remapping of the CA1 representation"? Global remapping has a precise definition that involves quantifying the stability of all place fields recorded. Without a color scale bar in Figure 3D (which should be added), we cannot know whether the overall representations were independent before and after the spontaneous reset. It would be good to know if some neurons are able to maintain place coding (more often than expected by chance), suggestive of a partial-remapping phenomenon.

      We have performed the analysis suggested by the Reviewer and determined what fraction of CA1PCs retained its original tuning property after the representation switch. We found that the remapping was essentially global, as only a small fraction (5.4%) of CA1PCs retained their pre-switch tuning curve after the switch. This is now described in the Results.

      We now state in the figure legend for the former Figure 3D (now Figure 3F) that the color scale applies to all subpanels.

      We would like to note that we do not conclude that non-BTSP events occur exclusively during global remapping – we have found a sizable fraction of PFF by non-BTSP mechanism also in the familiar environment with no signs of change in the population representation. We agree nonetheless that PFF is dominated by BTSP under these conditions, whereas the contribution of non-BTSP is larger during global remapping events.

      Second, BTSP has a flip side that involves the weakening of existing place fields when a novel field emerges. Was this observed in the present study? Presumably place fields can disappear due to this bidirectional BTSP or due to global remapping. For a full comparison of the two phenomena, the disappearance of place fields must also be assessed.

      In this study we focused on the birth of new PFs – yet, PFs not only form but also disappear constantly. The factors driving PF weakening are even less explored and understood than those driving PF birth. In fact, we observed (as illustrated by several examples in our MS) that many PFs weaken, or disappear completely during the course of an imaging session. These effects are sometimes accompanied by a new PFF event elsewhere (e.g. Figure 2 – figure supplement 2E bottom), whereas in other cases they are not (e.g. Figure 5A, middle). Similarly, some BTSP events seem to coincide with disappearance of another PF, but others are not (e.g. Figure 2A bottom, first PF along the track; Figure 3 – figure supplement 1A left, first PF). The picture is further complicated in the case of global remapping events (i.e. representation switches, Figure 3 – figure supplement 2B) that, by definition, include both new PFF and PF disappearance. We feel that exploration of the complex mechanisms at play in PF disappearance is outside the scope of the current study, but could be the subject of an interesting future investigation.

      Finally, it would be good to know if place fields differ according to how they are born. For example, are there differences in reliability, width, peak rate, out-of-field firing, etc for those that arise due to BTSP vs non-BTSP.

      We have analyzed several properties of the PFs and found no significant difference in either their width (BTSP: 46.4 ± 24.4 cm; non-BTSP: 50.4 ± 32.5 cm, p = 0.28) or peak rates (BTSP: 19.0 ± 14.7 a.u./s; non-BTSP: 21.4 ± 16.8 a.u./s, p = 0.27) or the out-of-field firing rates (BTSP: 0.64 ± 0.68 a.u./s; non-BTSP: 0.83 ± 1.25 a.u./s, p = 0.09, all unpaired t-test). We have included these data into the Results section.

      Reviewer #1 (Recommendations for the authors):

      Consider adding additional visual cues or environmental elements to the virtual reality (VR) setup to create a more enriched and immersive environment. Collect data from a couple of mice in the enriched environment and compare the PFF dynamics to the original environment. This would help determine whether the findings on PFF dynamics hold in a setting where spatial coding may be more robust. Including floor cues, distal visual markers, or varying textures might provide a more comprehensive understanding of the factors influencing BTSP-like and non-BTSP-like events.

      We thank the Reviewer for her/his suggestion of analyzing data obtained from a more enriched VR environment compared to the one we used in our study. We have now included data obtained in a profoundly different VR environment, which did not have sparse dominant visual landmarks, but the entire wall was covered with a rich pattern with different shapes of different colors. Our data from 11 imaging sessions from 4 mice revealed BTSP- and non-BTSP-like PFF events with approximately the same ratio to that found in our regular maze. These results are described in the Results section and are presented in a new supplementary figure (Figure 2 – figure supplement 2). 

      Wherever deconvolved spikes were used for analysis, provide a comparison of results obtained directly from the GCaMP ΔF/F signals versus those derived from the deconvolved spiking data. This could illustrate any differences and help readers understand the limitations and reliability of the inference method.

      We have adopted a currently widely accepted method in the field to infer spikes from fluorescent traces using the Suite2p software package. All of our analyses were then performed on the inferred spikes. To address the concerns of the Reviewer, we analyzed the relationship between the peak [Ca<sup>2+</sup>] transients and inferred spike activity (new Figure 3 – figure supplement 1C-E). Our results clearly demonstrate a robust, highly significant correlation between these measures at the level of individual cells (new Figure 3 – figure supplement 1D) and the Spearman correlation coefficients show a distribution that is very different from random distributions (new Figure 3 – figure supplement 1E). From these, we conclude that using directly the fluorescent data would have resulted in largely similar PF detection and identification.

      Improve the visual clarity of figures by enlarging key elements such as arrows that indicate BTSP-like events. Consider using colors that stand out more clearly to guide readers' attention. Include annotations of statistical significance directly on the figures (e.g., adding NS or * indicators) to make it clear which comparisons are statistically significant. This will help readers quickly interpret the data without needing to refer back to the text.

      Based on the suggestion of the Reviewer, we have enlarged the arrows. We have also indicated statistical results on the figures. Because some of the results of factorial ANOVA tests are difficult to be comprehensively indicated on our plots, we kept the description of the statistical results in the legends as well. We hope that these alterations will make data interpretation easier.

      Replace or supplement bar plots with violin plots or scatter plots that show the distribution of individual data points. This change would offer a clearer picture of data variability and underlying trends, aiding readers in assessing the robustness of the results.

      We have changed the plots and now present all data points.

      Add more detailed quantification in the results section, specifying the total number of newly formed place fields, the proportion that are categorized as BTSP-like versus non-BTSP-like, and how many events did not fit these categories. Explicitly state what fraction of the total recorded place field formations are represented by the 59 non-BTSP-like events mentioned, as this is currently difficult to discern.

      The number of BTSP- and non-BTSP-like PFF events are given in the MS. As described in the Methods, after identifying BTSP- and non-BTSP-like PFF events using the shift and gain criteria, we have manually checked each of these ROIs and the spatial footprint of every new PFF events for these cells and excluded ROIs with non-soma-like shapes and activities with spurious footprints suggesting contamination, creating a ‘cleaned’ dataset. We did not perform such visual inspection and manual curation of every ROI’s spatial footprints that belong to the two additional categories (no gain with shift, gain without shift, 872 events). Since these classes are also overestimated without curation, we cannot provide a precise fraction of the BTSP- and non-BTSP-like PFF events from the total recorded PFF population. However, - assuming that factors leading to exclusion affect all groups equally - we can provide their fractions by comparing the numbers of newly born PFs in all categories before the visual inspections. In the normal maze, we found 806 candidate BTSP-like (52%),164 non-BTSP-like (10%) PFFs and an additional 593 PFs (38%) could not be included in these two groups [40 PFs (3%) with formation lap gain and backward shift but significant backward drift; 238 PFs (15%) with formation lap gain but without backward shift; 315 PFs (20%) with no formation lap gain but with backward shift]. These data have been included in the Methods.

      Ensure that all statements describing specific findings are consistently linked to the appropriate figures and panels. There are instances in the text where results are discussed without clear references, which can make it challenging for readers to verify the data. For example, the section on population remapping in a novel environment should point directly to the relevant figure panels to guide readers.

      We regret that our text was not linked properly to the appropriate figures. We corrected this during the revision.

      Given that BTSP-like events are inferred rather than directly confirmed, it would be prudent to frame conclusions about their sufficiency in more tentative terms, acknowledging the limitations of the current data. Consider adding a discussion of potential future experiments that could confirm whether these large transients truly represent BTSP events, including evidence for complex spikes or calcium plateau potentials.

      The Reviewer is correct that we do not have direct evidence that all large somatic Ca<sup>2+</sup> events represent dendritic plateau potentials. Now we discuss this and other limitations in the MS (Discussion section).

      Reviewer #2 (Recommendations for the authors):

      Although the author has outlined three characteristics of place fields (PFs) generated by behavioral time scale synaptic plasticity (BTSP) during calcium imaging in the Introduction section, as follows: ' First, the prolonged CSB results in large [Ca<sup>2+</sup>] transient during the initial PFF event, typically followed by weaker Ca2+ signals on consecutive traversals through the PF. Second, due to the long and asymmetric temporal kernel of the plasticity (favoring potentiation of inputs active 1-2 seconds before the CSB) a substantial backward shift in the spatial position of the PF center can be observed on linear tracks after the formation lap. Third, the width of the new PF is generally proportional to the running speed of the animal during the PFF event.' Figure 3B, which displays the third feature of classified BTSP and non-BTSP data, serves as an important confirmation of the classification results using the first two features. Even though the Spearman correlation indicated a significant difference, the raw data distributions of BTSP and non-BTSP appear similar, suggesting that a distribution of bootstrap and more stringent confirmation should be conducted to be convincing.

      As described in the MS, because of the difference in the number of events in the two groups, we randomly subsampled the BTSP-like events to the sample size of the non-BTSP-like PFF events 10000 times and performed regression analysis. This bootstrapping revealed that both the r and p values of the fit to the non-BTSP data fell outside the 95% confidence interval of the bootstrapped BTSP values, indicating that the difference between the groups was robust.

      In further analysis during the revision, we found that the PF width variance explained by distance from landmarks is substantially larger than the variance explained by the running speed during the formation lap. We performed a cross-validated analysis by these two factors (Figure 3D), which highlights that speed explains some of the PF width variance of BTSP-like PFFs, but none of the non-BTSP PFFs.

      The proportions of the three types should be provided. page 6: ' Using a conservative approach, we categorized a new PF to be formed by a BTSP-like mechanism if it had both positive gain and negative shift values (Figure 2A; n = 310 new PFs), whereas new PFs exhibiting neither positive gain nor negative shift were considered as non-BTSP-like events (Figure 2B; n = 59). All other newly formed PFs (no-gain with backward shift and gain without backward shift) were excluded from further analysis.' The number of excluded newly formed PFs should be disclosed, as well as the distribution ratio of these three types in each animal.

      The number of BTSP- and non-BTSP-like PFF events are given in the MS. As described in the Methods, after identifying BTSP- and non-BTSP-like PFF events using the shift and gain criteria, we have manually checked each of these ROIs and the spatial footprint of every new PFF events for these cells and excluded ROIs with non-soma-like shapes or spurious activities, creating a ‘cleaned’ dataset. We did not perform such visual inspection and manual curation of every ROI’s spatial footprints that belonged to the two additional categories (no gain with shift, gain without shift, 872 events). Since these classes are also overestimated without curation, we cannot provide a precise fraction of the BTSP- and non-BTSP-like PFF events from the total recorded PFF population. However, - assuming that factors leading to exclusion affect all groups equally - we can provide their fractions by comparing the numbers of newly born PFs in all categories before the visual inspections. In the normal maze, we found 806 candidate BTSP-like (52%),164 non-BTSP-like (10%) PFFs and an additional 593 PFs (38%) could not be included in these two groups [40 PFs (3%) with formation lap gain and backward shift but significant backward drift; 238 PFs (15%) with formation lap gain but without backward shift; 315 PFs (20%) with no formation lap gain but with backward shift]. These data have been included in the Methods.

      Figure 2C, while showing an overall decrease in amplitude from the formation lap to the next lap, could benefit from a pairwise analysis of the corresponding formation lap and the following lap of each session to provide more convincing and detailed results.

      We now present all data with connected lines across consecutive laps to illustrate the changes in each ROI. Our statistical analysis included the pairwise comparison of amplitudes.

      The experiment's time range is broad (11-99 days); it is worth investigating whether different training intervals might influence the results.

      Based on the suggestion of the Reviewer, we have analyzed the elapsed time and the number of sessions from the first training to the recording, and we demonstrate that there is no correlation of these parameters with the number of new PFFs. These data are now presented in Figure 2 – figure supplement 1C.

      It is unclear whether the formation of place fields also generates characteristic features of dendritic properties.

      It is not clear to us which ‘characteristic dendritic features of dendritic properties’ generated by PFF the Reviewer refers to. Since we did not image dendrites of individual CA1PCs, we have no information about dendritic properties of the neurons.

      It may be necessary to add a clearer figure to illustrate the correlation between width and speed following the downsampling of non-BTSP-like events (refer to Figure 3B).

      We have performed extensive additional analysis on the relationship of PF width with various behavioral factors, including the speed of the animal in the formation lap. Inspection of the PF width distributions along the track revealed a close association of PF width with the distance of the animal from the nearest visual landmark in the corridor, so that PFs close to landmarks were narrower than PFs between landmarks. We found that the PF width variance explained by distance from landmarks is substantially larger than the variance explained by the running speed during the formation lap. Nevertheless, there is a clear difference between BTSP-like and non-BTSP-like PFFs: running speed explains some variance in the case of BTSP-like PFFs, but none for non-BTSP-like PFFs.

      We have included these findings into the Results section and created two new panels in Figure 3 (C, D) and Figure 3 – figure supplement 1 (A, B).

      It is recommended that statistical results be labeled in the figures with n.s. or stars for better readability.

      Based on the suggestion of the Reviewer, we have indicated statistical results on the figures. Because some of the results of factorial ANOVA tests are difficult to be comprehensively indicated on our plots, we kept the description of the statistical results in the legends as well. We hope that these alterations will make data interpretation easier. We hope that these alterations will make data interpretation easier.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The question of how central nervous system (CNS) lamination defects affect functional integrity is an interesting topic, though it remains a subject of debate. The authors focused on the retina, which is a relatively simple yet well-laminated tissue, to investigate the impact of afadin - a key component of adherens junctions on retinal structure and function. Their findings show that the loss of afadin leads to significant disruptions in outer retinal lamination, affecting the morphology and localization of photoreceptors and their synapses, as illustrated by high-quality images. Despite these severe changes, the study found that some functions of the retinal circuits, such as the ability to process light stimuli, could still be partially preserved. This research offers new insights into the relationship between retinal lamination and neural circuit function, suggesting that altered retinal morphology does not completely eliminate the capacity for visual information processing.

      Strengths:

      The retina serves as an excellent model for investigating lamination defects and functional integrity due to its relatively simple yet well-organized structure, along with the ease of analyzing visual function. The images depicting outer retinal lamination, as well as the morphology and localization of photoreceptors and their synapses, are clear and well-described. The paper is logically organized, progressing from structural defects to functional analysis. Additionally, the manuscript includes a comprehensive discussion of the findings and their implications.

      Weaknesses:

      While this work presents a wealth of descriptive data, it lacks quantification, which would help readers fully understand the findings and compare results with those from other studies. Furthermore, the molecular mechanisms underlying the defects caused by afadin deletion were not explored, leaving the role of afadin and its intracellular signaling pathways in retinal cells unclear. Finally, the study relied solely on electrophysiological recordings to demonstrate RGC function, which may not be robust enough to support the conclusions. Incorporating additional experiments, such as visual behavior tests, would strengthen the overall conclusions. 

      We would like to thank the reviewer for the thoughtful and valuable comments that helped us to further improve the manuscript. We have revised the manuscript to address the following three points in response to the reviewer's comments.

      While this work presents a wealth of descriptive data, it lacks quantification, which would help readers fully understand the findings and compare results with those from other studies.

      In response, we quantified the position of each retinal cell type and measured retinal thickness in the cHet and cKO mice at 1M, as presented in Figures 2F–M. To reflect these additions, we have included explanatory text in the revised manuscript (see lines 507–533).

      Furthermore, the molecular mechanisms underlying the defects caused by afadin deletion were not explored, leaving the role of afadin and its intracellular signaling pathways in retinal cells unclear.

      As AJ components, such as catenin and cadherin, are known to be associated with several signaling pathways, including Notch and Wnt signals (PMID: 37255594), we speculated that these pathways might be disrupted in the afadin cKO retina. Since these pathways are involved in cell proliferation, we examined the number of progenitor cells in the afadin cKO retina at developmental stages P1, P3, and P5 (new Figure S6C, see lines 868-870). No significant differences were observed at any of these stages. We also quantified the number of each retinal cell type at P14 when differentiation is complete. In the cKO retina, the number of BCs significantly increased, whereas the number of photoreceptors significantly reduced (new Figure S4C, see lines 620-622). To our knowledge, activation or inactivation of any AJ-associated signaling pathway does not reproduce the cell fate alterations observed in the afadin cKO retina. These findings suggest that the above pathways related to AJ may be unchanged in the cKO retina. However, we cannot exclude the possibility that multiple signaling pathways may be affected simultaneously or other pathways affected in the cKO retina.

      Finally, the study relied solely on electrophysiological recordings to demonstrate RGC function, which may not be robust enough to support the conclusions. Incorporating additional experiments, such as visual behavior tests, would strengthen the overall conclusions.

      We appreciate the reviewer’s insightful suggestion. To more robustly evaluate visual function in the cKO mice, we performed optomotor response (OMR) and visual cliff tests using cHet, cKO, and optic nerve crush (ONC) mice with Aki Hashio, Yuki Emori, and Mao Hiratsuka. We added their name as co-authors to the new manuscript. In the OMR test, cKO mice exhibited fewer responses to visual stimuli than cHet mice but significantly more than ONC mice. Furthermore, although no significant difference was detected between cKO and ONC mice in the visual cliff test, some cKO mice displayed cautious behavior suggestive of depth perception. These results indicate that cKO mice retain partial visual function, which is consistent with the MEA analysis. We have included these data as the new Figure 8 and incorporated the findings into the revised manuscript in the Introduction (lines 130-131 and 133-134), Methods (lines 378-406), Results (lines 775-816), and Discussion sections (lines 1026-1035).

      Reviewer #2 (Public review):

      Summary:

      Ueno et al. described substantial changes in the afadin knockout retina. These changes include decreased numbers of rods and cones, an increased number of bipolar cells, and disrupted somatic and synaptic organization of the outer limiting membrane, outer nuclear layer, and outer plexiform layer. In contrast, the number and organization of amacrine cells and retinal ganglion cells remain relatively intact. They also observed changes in ERG responses and RGC receptive fields and functions using MEA recordings.<br /> Strengths:

      The morphological characterization of retinal cell types and laminations is detailed and relatively comprehensive.

      Weaknesses:

      (1) The major weakness of this study, perhaps, is that its findings are predominantly descriptive and lack any mechanistic explanation. As afadin is key component of adherent junctions, its role in mediating retinal lamination has been reported previously (see PMCID: PMC6284407). Thus, a more detailed dissection of afadin's role in processes, such as progenitor generation, cell migration, or the formation of retinal lamination would provide greater insight into the defects caused by knocking out afadin.

      Thank you for valuable comments. We agree with the reviewer's point that findings are predominantly descriptive and lack any mechanistic explanation. However, we would like to clarify that the study cited in the comment (PMCID: PMC6284407) analyzed the role of afadin in dendritic stratification of direction-selective RGCs within the IPL, where “lamination” refers to the layering of RGC dendrites in the IPL. Here, we analyzed the function of afadin in the laminar construction of the overall retina.

      In response to the reviewer’s comment, we have added new analyses addressing retinal lamination, as well as the number and spatial distribution of progenitor cells, during development in the cKO retina. These new results are shown in Figures 4E, 9C–F, S5A–C, and S6C of the revised manuscript, and corresponding explanations added in the revised text (lines 643–662 and 855–870).

      (2) The authors observed striking changes in the numbers of rods, cones, and BCs, but not in ACs or RGCs. The causes of these distinct changes in specific cell classes remain unclear. Detailed characterizations, such as the expression of afadin in early developing retina, tracing cell numbers across various early developmental time points, and staining of apoptotic markers in developing retinal cells, could help to distinguish between defects in cell generation and survival, providing a better understand of the underlying causes of these phenotypes.

      Thank you for the insightful comment. Following the reviewer’s suggestion, we quantified the number of retinal cell types at P14 when cell differentiation is complete (new Figure S4C). At P14, the numbers of photoreceptors and BCs were significantly reduced in the cKO retina, while Müller glia, which was significantly reduced at 1M, showed no difference. We further examined the number of rods and BCs at P1, P3, and P5 (new Figures S4E, F). No significant differences were detected at P1 or P3, however, at P5, rod marker expression was significantly decreased, while the number of BCs was significantly increased. These results suggest that the defects in cell fate determination of BCs and rods begin to emerge between P3 and P5, a period for which rods and BCs actively differentiate. We speculate that cells originally destined to become rods may instead differentiate into BCs in the cKO retina. In addition, we found a significant increase in apoptotic cells at P1, P3, P5, and P14 (new Figure S6B). Furthermore, Müller glia and rod photoreceptors showed significantly greater reduction at 1M compared to P14, suggesting that the reduction in Müller glia observed at 1M may be due to post-differentiation cell death. These are presented in Figures S4C, S4E–F, and S6B, and described in the revised manuscript (lines 620-635 and 827-838).

      (3) Although the total number of ACs or RGCs remains unchanged, their localizations are somewhat altered (Figures 2E and 4E). Again, the cause of the altered somatic localization in ACs and RGCs is unclear.

      Thank you for the valuable question. In response to the reviewer’s comment, we analyzed the position of RGCs and ACs in the developing cKO retina. In the cKO retina at P1, retinal cells were organized into distinct multicellular compartments with clear boundaries, and acellular regions extending to the outer retinal surface were observed at these boundaries. These acellular regions contained dendritic processes of RGCs and ACs, which are components of the IPL, indicating that elements of the IPL extended vertically across the retina. As development progressed, the compartment boundaries gradually shifted toward the inner retina. At P14, the IPL was mainly located on the inner retina, as in the normal retina. However, some IPL structures remained in the outer retina and may correspond to the acellular patches. We have included the above data in the revised manuscript as Figures S5A and S5B and revised the manuscript to include this point (lines 643-660).

      (4) One conclusion that the authors emphasise is that the function of RGCs remains detectable despite a major disrupted outer plexiform layer. However, the organization of the inner plexiform layer remains largely intact, and the axonal innervation of BCs remains unchanged. This could explain the function integrity of RGCs. In addition, the resolution of detecting RGCs by MEA is low, as they only detected 5 clusters in heterozygous animals. This represents an incomplete clustering of RGC functional types and does not provide a full picture of how functional RGC types are altered in the afadin knockout.

      We appreciate the reviewer’s insightful comments. Although our clustering of RGC subtypes in afadin cHet retinas resulted in only five clusters, the key finding of our study is the preservation of RGC receptive fields in afadin cKO retinas, despite severe photoreceptor loss (reduced to about one-third of normal) and disruption of photoreceptor-bipolar cell synapses in the OPL. This suggests that even with crucial damage to the OPL, the primary photoreceptor-bipolar-RGC pathway can still function as long as the IPL remains intact. Moreover, the presence of rod-driven responses in RGCs indicates that the AII amacrine cell-mediated rod pathway may also continue to function. We agree that our functional clustering in afadin cHet retinas was incomplete. However, we guess that the absence of RGCs with fast temporal responses in afadin cKO retinas may not simply be due to the loss of specific RGC subtypes but due to disrupted synaptic connections between photoreceptors and fast-responding BCs. Furthermore, the structural abnormalities in retinal lamination in afadin cKO retinas may alter RGC response properties, making strict functional classification less meaningful. We would like to emphasize the finding that disruption of the retinal lamination in afadin cKO retinas leads to the absence of RGCs with fast temporal response properties, rather than focusing solely on the classification of RGC subtypes.

      Minor Comments:

      (1) Line 56-67: "Overall, these findings provide the first evidence that retinal circuit function can be partially preserved even when there are significant disruptions in retinal lamination and photoreceptor synapses" There is existing evidence showing substantial adaption in retinal function when retinal lamination or photoreceptor synapses are disrupted, such as PMCID: PMC10133175.

      Thank you for your comment. We agree that the original sentence was ambiguous in its wording, and we have revised it to clarify our intended meaning (lines 48-50):

      "Overall, these findings provide the first evidence that retinal circuit function can be partially preserved even when there are significant disruptions in both retinal lamination and photoreceptor synapses."

      In response, we have cited this study and added the following sentence to the Discussion section of the revised manuscript. The paper you mentioned is crucial for discussing and considering the results of our study. We have cited this study and added the following sentence to the Discussion section of the revised manuscript (lines 910-915):

      “Furthermore, RFs of RGCs are also detected in several mouse models of retinitis pigmentosa, in which rod photoreceptors are degenerated and surviving cone photoreceptors lose their OS discs and pedicles, instead forming abnormal processes resembling synaptic dendrites (Barhoum et al., 2008; Ellis et al., 2023; Scalabrino et al., 2022).”

      (2) Line 114-115: "we focused on afadin, which is a scaffolding protein for nectin and has no ortholog in mice." The term "Ortholog" is misused here, as the mouse has an afadin gene. Should the intended meaning be that afadin has no other isoforms in mouse?

      Thank you for pointing it out. As we misused "Ortholog" as "Paralog", we revised the sentence (line 108).

      Recommendations for the authors:

      (1) The introduction to afadin is insufficient. Please provide more background information about this protein.

      Following the reviewer’s recommendations, we expanded the Introduction in the revised manuscript to provide a more detailed background on afadin, as follows (lines 108-119):

      “Afadin regulates the localization of nectin, which initiates cell–cell adhesion and promotes AJ formation by recruiting the cadherin–catenin complex. (Ohama et al., 2018; Takai and Nakanishi, 2003). In addition, afadin interacts with various cell adhesion and signaling molecules, as well as the actin cytoskeleton, and contributes to the accumulation of β-catenin, αE-catenin, and E-cadherin at AJs (Sakakibara et al., 2018; Sato et al., 2006). Afadin KO mice exhibit severe disruption of AJs in the ectoderm, along with other developmental defects, leading to embryonic lethality (Ikeda et al., 1999; Zhadanov et al., 1999). Conditional deletion of afadin in RGCs leads to disruption of dendrites in ON-OFF direction-selective RGCs (Duan et al., 2018). However, the effect of afadin loss on retinal lamination, circuit formation, and function is poorly understood.”

      (2) In Figure 1A (Bottom), regarding the peptide+ image, what does the green signal represent?

      The green signal observed in the peptide+ image represents the background and non-specific staining. We have added the sentence to the legend of Figure 1A in the revised manuscript (lines 1067-1068).

      (3) In the RESULTS section on page 17, the statement "Nectin-1, unlike nectin-2 and nectin-3, was partially co-localized with afadin at the OPL and IPL, in addition to the OLM" suggests that nectin-2 is also expressed at the IPL, as shown in Figure S1A. Providing high-power images, similar to those in Figure S1B, could help readers clearly recognize the staining signals.

      Following your suggestion, we added higher-magnification images of Nectin-2 signals in the IPL to Figure S1A and included the following clarification in the Figure legend (lines 1356-1358):

      “Nectin-2 and nectin-3 were localized in the OLM. The Nectin-2 signal in the IPL was insufficient for reliable assessment of its localization and colocalization.”

      (4) Figure S2A requires an uncropped scan of the membrane after Western blotting to demonstrate that there are no non-specific bands when using this afadin antibody, which was also utilized for IHC.

      We revised the new Figure S2C to include the uncropped membrane scan. Faint non-specific bands were observed in the Western blot, consistent with detecting non-specific signals in immunostaining using the anti-afadin antibody pre-absorbed with its antigen peptide.

      (5) IHC staining is necessary to demonstrate the knockout of afadin in retinal cells, as the paper does not show Cre expression in the retinal cells of the Dkk3-Cre mouse line. This would also help verify the specificity of the afadin antibody.

      In the cKO retina, the laminar structure was disrupted, and the background signal was generally high, making it difficult to reliably assess whether afadin expression was lost using immunostaining with the anti-afadin antibody. Therefore, in addition to the Western blot analysis already presented, we evaluated Cre activity in the Dkk3-Cre mouse line by crossing it with the R26-H2B-EGFP reporter line. Cre-mediated recombination was observed in all retinal cells at P0 and 1M. We have added these results to a revised Figure S2A and B and included explanatory text in the revised manuscript (lines 455–458).

      (6) Why is the outer nuclear layer (ONL) severely impaired in the cKO mice when afadin is not expressed in this layer? Additionally, given that afadin is highly expressed in the inner plexiform layer (IPL), why does the cKO not affect its structure?

      We speculate that the AJ defect in the outer retina during development may cause severe disruption of the ONL in afadin cKO mice. As shown in new Figure 9, ectopic AJs and aberrant position of mitotic cells were observed in the P0 cKO retina. These defects caused abnormal cell migration and position, resulting in the ONL disruption. On the other hand, in the IPL, afadin and other cell adhesion molecules may function redundantly, and thus, the IPL structure would be kept intact in the afadin cKO retina. We have added this interpretation to the Discussion section of the revised manuscript (lines 998–1005).

      (7) In the RESULTS section on page 20, the authors state, "We further investigated adherens junctions (AJs) in the cKO retina by immunostaining with OLM adherens junction markers β-catenin, N-cadherin, and nectin-1. We found that these signals were dispersed in the cKO retina (Figure S2C)." It appears that β-catenin, N-cadherin, and nectin-1 can still be detected in the cKO retina.

      We agree with the reviewer that β-catenin, N-cadherin, and nectin-1 can still be detected in the cKO retina. We used the term 'dispersed' to indicate that the signal was “scattered” rather than “disappeared”. To avoid confusion, we have revised the wording in the revised manuscript (line 499).\

      (8) In Figure 3, please indicate where the zoomed-in images were captured from the low-power images. Additionally, point out the locations of zoomed-in images in other figures as well.

      Following the reviewer’s suggestion, we updated Figures 2D, 3A-C, 4A, S2D, S3A, S3D, S3E, and S5D. The related Figure legends have also been revised.

      (9) The authors should include individual data points in all statistical graphics to provide a clearer presentation of the data.

      As suggested by the reviewers, we have revised all statistical graphs to display individual data points. Furthermore, the statistical analysis of synapse counts in Figures 3E, 3F, and S3C has been changed to linear mixed models (LMM) or generalized LMM to account for the variability in the number of synapses within individual mice.

      (10) In the RESULTS section on page 23, the statement "These data indicate that the rosette-like structure in the cKO may be an ectopic IPL, termed 'acellular patches'". What is the mechanism that may cause the rosette-like structure to translocate from the IPL to the outer region of the retina?

      Thank you for raising a valuable question. To clarify the mechanism of acellular patch formation in the cKO mice, we analyzed the position of RGCs and ACs in the developing cKO retina. In the cKO retina at P1, retinal cells were organized into distinct multicellular compartments with clear boundaries, and acellular regions extending to the outer retinal surface were observed at these boundaries. These acellular regions contained dendritic processes of RGCs and ACs, which are components of the IPL, indicating that elements of the IPL extended vertically across the retina. As development progressed, the compartment boundaries gradually shifted toward the inner retina. At P14, the IPL was mainly located on the inner retina, as in the normal retina. However, some IPL structures remained in the outer retina and may correspond to the acellular patches. We have included these findings in the revised manuscript as Figures S5A and S5B and added the corresponding description to the text (lines 643–665).

      (11) Is the blood vessel structure normal in the cKO retina? Could this impact the survival of retinal cells?

      Thank you for your valuable comment. We performed immunostaining with an anti-CD31 antibody, a marker for blood vessels, as shown in the new Figure S2G. No apparent differences were observed in the cKO retina. We have added the following description to the revised manuscript (lines 539–543):

      “It has been reported that defects in the distal processes of Müller glia are associated with abnormal retinal vasculature (Shen et al., 2012). Thus, we immunostained the cKO retina with anti-CD31, a blood vessel marker, but no apparent vascular abnormalities were detected (Figure S2G).”

      (12) In the RESULTS section on pages 26-29, there is a lot of statistical information included in parentheses. It would be more concise to place this information in the figure legends, if possible.

      Following the reviewer's suggestion, we have moved the statistical information from the main text (pages 26–29) to the corresponding Figure legends.

      (13) In the RESULTS section on page 28, the authors state, "On the other hand, the inner retina was apparently normal, and both the inner nuclear layer (INL) and IPL could be recognized." However, in Fig 7A, it appears that the INL is mixed with the ONL and cannot be clearly identified.

      We agree with the reviewer that the INL is mixed with the ONL and cannot be clearly identified. Accordingly, we have revised the description in the text (lines 740–742) as follows:

      “On the other hand, the inner retina was apparently normal, and both the IPL and the proximal part of the INL could be recognized.”.

      (14) It is mentioned in the manuscript that "The receptive field (RF) area in the cKO retinas was significantly smaller than that in the cHet retinas." Is there an impairment in the dendritic fields of RGCs in the cKO retina that could lead to a smaller RF?

      Thank you for asking an interesting question. The dendritic field reflects the region where presynaptic cells can form synaptic contacts, whereas the receptive field is dynamically shaped by spatiotemporal excitatory and inhibitory inputs, gap junctions, and membrane properties of the dendrites. Consequently, the size of the dendritic field does not necessarily correspond to that of the receptive field. Moreover, the disruption of the retinal lamination in the afadin cKO retina may alter the morphology of RGC dendritic fields—even when RNA expression levels are identical—which makes it difficult to exactly compare the morphology of the same RGC subtype between afadin cHet and afadin cKO retinas. Additionally, due to the presence of over 40 RGC subtypes and the rosette-like structures in the afadin cKO retina, it is challenging to trace the complete dendritic arborization of individual RGCs. For these reasons, we rather hesitate to compare the dendritic field size and the receptive field size.

      (15) Figure 7H was not cited in the corresponding section of the main text.

      Thank you for pointing it out. We have added a citation of Figure 7H in the revised manuscript (line 759).

      (16) In Figure 8C, is there a difference in the number of pHH3+ mitotic cells between the cHet and cKO mice?

      We quantified the number of pHH3-positive cells in the cKO retina at P0, as shown in the new Figure 9B. The number of mitotic cells was significantly increased in the cKO retina (see lines 853-855). In contrast, the number of BrdU-labeled progenitor cells at P1, P3, and P5 was not significantly different between cHet and cKO retinas, as presented in the new Figure S6C. These results suggest that although the total number of progenitor cells remain unchanged in cKO retinas, the M phase may be prolonged.

      (17) The results related to Figure 8 should be moved to a location before Figure 5, as Figure 8 is also related to the lamination defects.

      In the original manuscript, Figures 2–7 presented the phenotypes observed in the cKO retina, while Figure 8 addressed the possible cause of the lamination defects. Since the revised Figure 8 presents behavioral tests evaluating visual function, the phenotypic analyses are presented in the revised Figures 2–8. In response to the reviewers’ comments, we further analyzed the distribution of mitotic and progenitor cells during development and included these results as revised Figure 9.

      (18) In the DISCUSSION section on page 32, the authors state, "A few photoreceptor-bipolar cell-retinal ganglion cell (BC-RGC) pathways (vertical pathways of the retina) are inferred to be maintained in the cKO retina." The authors could verify this using retrograde transsynaptic tracing with a pseudorabies virus injected into the superior colliculus.

      Thank you for your interesting suggestion. This is an important point, and the recommended experiment idea sounds excellent. We attempted this analysis; however, the virus injected into the superior colliculus successfully labeled RGCs but failed to reach BCs and photoreceptors in normal mice. We guess that light stimulation evoked RGC firings evidently show that the photoreceptor-bipolar cell-retinal ganglion cell (BC-RGC) pathways function.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and this reference has been more fully discussed in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and excitement for our results, including our findings on T3SS activity and Shigella-septin interactions. In accordance with the Reviewer’s comments, we avoid overselling our data in the revised version of the manuscript.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescently labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To conclude that cell volume is indeed increased, the investigators should consider staining the cells with markers that demarcate cell boundaries and/or are confined to the cytosol, i.e., a cell tracker dye.

      Staining using our SEPT7 antibody enables us to define cell boundaries for cellular area measurements (Novel Figure 1 - figure supplement 1A). However, we agree with the Reviewer that staining cells with additional markers (such as a cell tracker dye) would be required to conclude that cell volume is increased. We therefore adjust our claims in the main text (lines 107-115 and 235-246).

      (2) Line 27: I understand what is meant by "recruited to actively pathogenic bacteria with increased T3SS activation." However, one could argue that there are many different roles of the intracytosolic bacteria in pathogenesis in terms of pathogenesis, not just actively secreting effectors.

      T3SS secretion by cytosolic bacteria is tightly regulated and both T3SS states (active, inactive) likely contribute to the pathogenic lifestyle of S. flexneri. In agreement with this, we removed this statement from the manuscript (lines 27, 225 and 274).

      (3) Line 88: Please clarify in the text that HeLa cells are being studied.

      We explicitly mention that the epithelial cell line we study is HeLa in the main text (line 93), in addition to the Materials and methods (line 328).

      (4) Line 97: is it possible to quantify the average distance of the nuclei from the cell perimeter? This would help provide some context as to what it means to be a certain distance from the nucleus, i.e., is there another way to point out that distance from nuclei correlates with movement inward post-invasion at the periphery?

      To provide more context to the inward movement of bacteria to the cell centre, we provide calculations based on measurements in Figure 1G, I. If we approximate geometric shape of both cells and nucleus to a circle, the median radius of a HeLa cell is 31.1 µm<sup>2</sup> (uninfected cell) and 36.3 µm<sup>2</sup> (infected cell). Similarly, the median radius of the nucleus is 22.2 µm<sup>2</sup> (uninfected cell) and 24.57 µm<sup>2</sup> (infected cell).

      However, we note that Figure 1F shows distance of bacteria to the centroid of the cell, which is the geometric centre of the cell, and which does not necessarily coincide with the geometric centre of the nucleus. We also note that nuclear area increases with infection (in a bacterial dose dependent manner). Finally, we note that these measurements are performed on max projections of 3D Z-stacks. In this case we cannot fully appreciate distance to the nucleus for bacteria located above it.

      (5) Lines 212-213 - there is no Figure 9A, B - I think this should be Figure 7A, B.

      Text has been updated (lines 216-217).

      Reviewer #2 (Recommendations For The Authors):

      Testing the analysis pipeline as a proof-of-concept question such as the comparison of caging around the laboratory strain as compared to one or a few clinical isolates or mutants of interest would help stress the relevance of this new, remarkable tool.

      We thank the Reviewer for their enthusiasm.

      Future research in the Mostowy lab will capitalise on the high-content tools generated here to explore the frequency and heterogeneity of septin cage entrapment for a wide variety of S. flexneri mutants and Shigella clinical isolates.

      The sentence in line 215 ends with "in agreement with" followed by a reference.

      Text has been updated (line 219).

      The sentence in line 217 on the correlation between caging and T3SS is not very clear.

      Text has been clarified (lines 221-223).

      There is a typo in line 219 : "protrusSions"

      Text has been updated (line 223).

      Reviewer #3 (Recommendations For The Authors):

      Major points

      The quantitative analysis approach in Figure 1 has multiple issues. Some examples:<br /> (1) How was the cell area estimated? Normally, a marker for the whole cell (CellMask or similar) or cells expressing GFP would be good indicators. Here it is not clear to me what was done.

      The cell area was estimated using SEPT7 antibody staining which is enriched under the cell cortex. CellProfiler was used to segment cells based on SEPT7 staining, using a propagation method from the identified nucleus based on Otsu thresholding. To provide more clarity on how this was performed, we now include a new figure (Figure 1- figure supplement 1A) showing a representative image of HeLa cells stained with SEPT7 and the corresponding cell segmentation performed with CellProfiler software, together with an updated figure legend explaining the procedure (lines 784–787).

      (2) The authors use Hoechst and integrated z-projections (Figure 1 S1) as a proxy to estimate nuclear volume. Hoechst staining depends on the organization of the DNA within the nucleus and I find that the authors need to do better controls to estimate nuclear size - this would be possible with cells expressing fluorescently labeled histones, or even better with a fluorescently tagged nuclear pore/envelope marker. The current quantification approach is misleading.

      We understand Reviewer #3’s concerns about using Hoechst staining as a proxy of nuclear volume, due to potential differences in DNA organisation within the nucleus.

      Following the recommendation of Reviewer #3 in the following point 3, text has been updated (lines 107–115 and 235-246).

      (3) Was cell density assessed for the measurements? If cells are confluent, bacteria could spread between cells within 3 hrs, if cells are less dense, this does not occur. When epithelial cells are infected for some hours, they have the tendency to round up a bit (and to appear thicker in z), but a bit smaller in xy. My suggestion to the authors (as they use these findings to follow up with experiments on the underlying processes) would be to tone down their statements - eg, Hoechst staining could be simply indicated as altered, but not put in a context of size (this would require substantial control experiments).

      Local cell density was not directly measured, but the experiment was set up to infect at roughly 80% confluency (cells were seeded at 10<sup>4</sup> cells/well 2 days prior to infection in a 96-well microplate, as described in the Materials and methods section) and to ensure bacterial spread between cells.

      In agreement with Reviewer #3 we tone down statements in the main text (see response to point 2 above).

      In addition, I found Figure 1 (and parts of Figure 2) disconnected from the rest of the manuscript, and it may even be an idea to take it out of the manuscript (that could also help to deal with my feedback relating to Figure 1). I would suggest starting the manuscript with the current Figure 3 and building the biological story with a stronger focus on SEPT7 (and its links with T3 secretion and actively pathogenic bacteria) from there on. As it stands, the two parts of the manuscript are not well connected.

      We carefully considered this comment but following revisions we have not reorganised the manuscript. We believe that high-content characterisation of S. flexneri infection in Figure 1 and 2 provides insightful information about changes in host cells in response to infection. Following this, we move onto characterising intracellular bacteria (and in particular those entrapped in septin cages) in the second part of the manuscript (Figure 3-7). Similar methods were used to analyse both host and bacterial cells and results obtained offer complementary views on host-pathogen interactions.

      My major reservation with the experimental work of the current version of the manuscript relates to Figure 5: The analysis of the septin phenotypes in Figure 5 seems to be problematic - to me, it appears that analysis and training were done on projected image stacks. As bacteria are rod-shaped their orientation in space has an enormous impact on how the septin signal appears in a projection - this can lead to wrong interpretation of the phenotypes. The authors need to do some quantitative controls analyzing their data in 3D. To be more clear: the example "tight" (second row) shows a bacterium that appears short. It may be that it's actually longer if one looks in 3D, and the septin signal could possibly fall in the category "rings" or even "two poles".

      The deep learning training and subsequent analysis of septin-cage entrapment is done on projected Z-stacks, which presents limitations. Future work in the Mostowy lab will exploit this first study and dive deeper into 3D aspects of the data.

      To address Reviewer #3’s concern, we include a sentence explaining that this analysis was performed using 2D max projections (lines 708 and 724), as well as acknowledging its limitations in the main text (lines 259-262).

      Minor points

      The scale bar in Fig 1 is very thin.

      We corrected the scale bar in Fig. 1 to make it more visible.

      Could it be that Figure 1F is swapped with Figure1E in the description?

      Descriptions for Figure 1E and F are correct.

      Line 27: what does "actively pathogenic bacteria" mean? I propose to change the term.

      We agree with Reviewer #3 that “actively pathogenic bacteria” should be removed from the text. This update is also in agreement with Reviewer #1 (see Reviewer #1 point 2).

      Line 28: "dynamics" can be confusing as it relates to dynamic events imaged by time-lapse.

      Although we are making a snapshot of the infection process at 3 hpi, we capture asynchronous processes in both host and bacterial cells (eg. host cells infected with different bacterial loads, bacterial cells undergoing actin polymerisation or septin cage entrapment). We agree that we are not following dynamics of full events over time. However, our high content approach enables us to capture different stages of dynamic processes. To avoid confusion, we replace “dynamics” by “diverse interactions” (line 28), and we discuss the importance of follow-up studies studying microscopy timelapses (line 274).

      Paragraph 59 following: the concept of heterogeneity was investigated in some detail for viral infection by the Pelkmans group (PMID: 19710653) using advanced image analysis tools. Advanced machine-learning-based analysis was then performed on Salmonella invasion by Voznica and colleagues (PMID: 29084895). It would be great to include these somewhat "old" works here as they really paved the way for high-content imaging, and the way analyses were performed then should be also discussed in light of how analyses can be performed now with the approaches developed by the authors.

      We agree. These landmark studies have now been included in the main text (lines 71-74).

      Line 181: I do not know what "morphological conformations" means, perhaps the authors can change the wording or clarify.

      We substituted the phrase “morphological conformations” by “morphological patterns” to improve clarity in the main text (lines 185).

      The authors claim (eg in the abstract) that they are measuring the dynamic infection process. To me, it appears that they look at one time-point, so no dynamic information can be extracted. I suggest that the authors tone down their claims.

      Please note our response above (Minor points, Line 28) which also refers to this question.

    1. Author response:

      Reviewer # 1 (Public review):

      A major concern is that the model is trained in the midst of the COVID-19 pandemic and its associated restrictions and validated on 2023 data. The situation before, during, and after COVID is fluid, and one may not be representative of the other. The situation in 2023 may also not have been normal and reflective of 2024 onward, both in terms of the amount of testing (and positives) and measures taken to prevent the spread of these types of infections. A further worry is that the retrospective prospective split occurred in October 2020, right in the first year of COVID, so it will be impossible to compare both cohorts to assess whether grouping them is sensible.

      We fully concur with the reviewer that the COVID-19 pandemic represents a profound confounding factor that fundamentally impacts the interpretation and generalizability of our model. This is a critical point that deserves a more thorough treatment. In the revised manuscript, we will add a dedicated subsection in the Discussion to explicitly analyze the pandemic’s impact. We will reframe our model’s contribution not as a universally generalizable tool for a hypothetical “normal” future, but as a robust framework demonstrated to capture complex epidemiological dynamics under the extreme, non-stationary conditions of a real-world public health crisis. We will argue that its strong performance on the 2023 validation data, a unique post-NPI “rebound” year, specifically showcases its utility in modeling volatile periods.

      The outcome of interest is the number of confirmed influenza cases. This is not only a function of weather, but also of the amount of testing. The amount of testing is also a function of historical patterns. This poses the real risk that the model confirms historical opinions through increased testing in those higher-risk periods. Of course, the models could also be run to see how meteorological factors affect testing and the percentage of positive tests. The results only deal with the number of positive (only the overall number of tests is noted briefly), which means there is no way to assess how reasonable and/or variable these other measures are. This is especially concerning as there was massive testing for respiratory viruses during COVID in many places, possibly including China.

      The reviewer raises a crucial point regarding surveillance bias, which is inherent in studies using reported case data. We acknowledge this limitation and will address it more transparently.

      (1) Clarification of Available Data: Our manuscript states that over the six-year period, a total of 20,488 ILI samples were tested, yielding 3,155 positive cases (line 471; Figure 1). We will make this denominator more prominent in the Methods section. However, the reviewer is correct that our models for Putian and the external validation for Sanming utilize the daily positive case counts as the outcome. The reality of our surveillance data source is that while we have the aggregate total of tests over six years, obtaining a reliable daily denominator of all respiratory virus tests conducted (not just for ILI patients as per the surveillance protocol) is not feasible. This is a common constraint in real-world public health surveillance systems.

      (2) Justification and Discussion: We will add a detailed paragraph to the Limitations section to address this. We will justify our use of case counts as it is the most direct metric for assessing public health burden and planning resource allocation (e.g., hospital beds, antivirals). We will also explain that modeling the positivity rate presents its own challenges, as the ILI denominator is also subject to biases (e.g., shifts in healthcare-seeking behavior, co-circulation of other pathogens causing similar symptoms). We will thus frame our work as forecasting the direct surveillance signal that public health officials monitor daily.

      Although the authors note a correlation between influenza and the weather factors. The authors do not discuss some of the high correlations between weather factors (e.g., solar radiation and UV index). Because of the many weather factors, those plots are hard to parse.

      This is an excellent point. Our preliminary analysis (Supplementary Figure S2) indeed confirms a strong positive correlation between solar radiation and the UV index. Perhaps the reviewer overlooked the contents of the supplementary information document. We have included the figure for their review. Our original discussion did explicitly address this multicollinearity, summarized as follows: We acknowledge the high correlation between certain meteorological variables. We then explain that our two-stage modeling approach is designed to mitigate this issue. In the first stage, the DLNM models assess the impact of each variable individually, thus isolating their non-linear and lagged effects without being confounded by interactions. In the second stage, the LSTM network, by its nature, is a powerful non-linear function approximator that is robust to multicollinearity and can learn the complex, interactive relationships between all input features, including correlated ones.

      Figure S2. Scatterplot matrix illustrating correlations between Influenza cases and meteorological factors. This comprehensive scatterplot matrix visualizes the relationships between influenza-like illness (ILI) cases, influenza A and B cases, and multiple meteorological variables, including average temperature, humidity, precipitation, wind speed, wind direction, solar radiation, and ultraviolet (UV) index. The figure is composed of three distinct sections that collectively provide an in-depth analysis of these relationships:

      (1) Upper-right triangle: This section presents a Pearson correlation coefficient matrix, with color intensity reflecting the strength of correlations between the variables. Red cells represent positive correlations, while green cells represent negative correlations. The closer the coefficient is to 1 or -1, the darker the cell and the stronger the correlation, with statistically significant correlations marked by asterisks. This matrix allows for a rapid identification of notable relationships between influenza cases and meteorological factors.

      (2) Lower-left triangle: This section contains scatterplots of pairwise comparisons between variables. These scatterplots facilitate the visual identification of potential linear or non-linear relationships, as well as any outliers or anomalies. This visualization is essential for evaluating the nature of interactions between meteorological factors and influenza cases.

      (3) Diagonal: The diagonal displays the density distribution curves for each individual variable. These curves provide an overview of the distribution characteristics of each variable, revealing central tendencies, variance, and any skewness present in the data.

      The authors do not actually compare the results of both methods and what the LSTM adds.

      We thank the reviewer for this comment and realize we may not have signposted the comparison clearly enough. Our manuscript does present a direct comparison between the LSTM and ARIMA models in the Results section (lines 737-745) and Table 2, where performance metrics (MAE, RMSE, MAPE, SMAPE) for both models on the 2023 validation set are detailed, showing LSTM’s superior performance, particularly for Influenza A. Furthermore, Figure 6 (panels A and B) visualizes the LSTM’s predictions against observed values, and Supplementary Figure S3 does the same for the ARIMA model, allowing for a visual comparison of their fit.

      To address the reviewer’s concern, in the revised manuscript, we will:

      (1) Add a more explicit comparative statement in the Results section, directly contrasting the key metrics and highlighting the LSTM’s advantages in capturing peak activities.

      (2) Consider combining the visualizations from Figure 6 and Supplementary Figure S3 into a single, more powerful comparative figure that shows the observed data, the LSTM predictions, and the ARIMA predictions on the same plot.

      Meandering methods; reliability of “Our Word in Data”; Figure 2A is hard to parse.

      We will address these points comprehensively.

      (3) Methods: We will significantly streamline and restructure the Methods section. We also wish to provide context that the manuscript’s current structure reflects an effort to incorporate feedback from multiple rounds of peer review across different journals, which may have led to some repetition. We will perform a thorough edit to improve its conciseness and logical flow.

      (4) Data Reliability: The reviewer raises a crucial and highly insightful question regarding the validity of using a national-level index to represent local public health interventions. This is a critical aspect of our model’s construction, and we are grateful for the opportunity to provide a more thorough justification.

      We acknowledge that the ideal variable would be a daily, quantitative, city-level index of non-pharmaceutical interventions (NPIs). However, the practical reality of the data landscape in China is that such granular, publicly accessible databases for subnational regions do not exist. Given this constraint, our choice of the Our World in Data (OWID) national stringency index was the result of a careful consideration process, and we believe it serves as the best available proxy for our study context.

      In the revised manuscript, we will significantly expand the Methods section to articulate our rationale, which is threefold:

      National Policy Coherence: During the COVID-19 pandemic in mainland China, core NPIs, particularly mandatory face-covering policies in shared public spaces, were implemented with a high degree of national uniformity. While local governments had some autonomy, they operated within a centrally defined framework, ensuring a baseline level of policy consistency across the country.

      Local Context Alignment: A key factor supporting the use of this national proxy is the specific epidemiological context of Putian during the study period. For the vast majority of the pandemic, Putian was classified as a low-risk area with only sporadic COVID-19 cases. Consequently, the city’s public health measures consistently aligned with the standard national guidelines. It did not experience prolonged or exceptionally strict local lockdowns that would cause a significant deviation from the national-level policy trends captured by the OWID index.

      Validation by Local Public Health Experts: Most critically, and to directly address your suggestion, our co-authors from the Putian Center for Disease Control and Prevention have meticulously reviewed the OWID stringency index against their on-the-ground, institutional knowledge of the mandates that were in effect. They have confirmed that the categorical levels (0-4) and the temporal trends of the OWID index provide a faithful representation of the public health restrictions concerning face coverings as experienced by the population of Putian.

      Therefore, we will revise our manuscript to make it clear that the use of the OWID index was not a choice of convenience, but a necessary and well-vetted decision. Given the unavailability of official local data, the OWID index, cross-validated by our local experts, represents the most rigorous and appropriate variable available to account for the profound impact of NPIs on influenza transmission in our model.

      (5) Figure 2A: We agree completely and will replace the heatmap with a multi-line plot or a stacked area chart to better visualize the temporal dynamics of influenza subtypes.

      We have preliminarily completed the redrawing of Figure 3A. The new and old versions are presented for your review to determine which figure is more suitable for this manuscript in terms of scientific accuracy and visual impact.

      Reviewer #2 (Public review):

      Weakness (1):

      The rationale of the study is not clearly stated.

      We appreciate the reviewer’s critique and acknowledge that the unique contribution of our study needs to be articulated more forcefully. Our introduction (lines 105-140) attempted to outline the limitations of existing studies, but we will revise it to be much sharper. The revised introduction will state unequivocally that our study’s rationale is to address a confluence of specific, unresolved gaps in the literature: 1) The persistent challenge of forecasting influenza in subtropical regions with their erratic seasonality; 2) The lack of studies that build subtype-specific models for Influenza A and B, which we show have distinct meteorological drivers; 3) The methodological gap in integrating the explanatory power of DLNM with the predictive power of a rigorously, Bayesian-optimized LSTM network; and 4) The unique opportunity to develop and test a model on data that encompasses the unprecedented disruption of the COVID-19 pandemic, a critical test of model robustness.

      Weakness (2):

      Several issues with methodological and data integration should be clarified.

      We interpret this as a general statement, with the specific issues detailed in the reviewer’s subsequent points and the “Recommendations for the authors” section. We will meticulously address each of these specific points in our revision. For instance, as a demonstration of our commitment to clarification, we will provide a much more detailed justification for our choice of benchmark model (ARIMA), as detailed in our response to Recommendation #11.

      Reviewer #2 (Recommendation  for the authors):

      The authors should justify why the baseline model selection was made by comparing the LSTM model only with ARIMA? How the outcomes could be sensitive to other commonly used machine learning methods, such as Random Forest or XGBoost, etc, as a benchmark for their performance.

      The reviewer raises a highly pertinent question regarding the selection of our benchmark model. A robust comparison is indeed essential for contextualizing the performance of our proposed LSTM network. Our choice to benchmark against the ARIMA model was a deliberate and principled decision, grounded in the specific literature of influenza forecasting at the intersection of climatology and epidemiology.

      In the revised manuscript, we will expand our justification within the Methods section and reinforce it in the Discussion. Our rationale is as follows:

      (1) ARIMA as the Established Standard: As we briefly noted in our original introduction (lines 110-113), the ARIMA model is arguably the most widely established and frequently cited statistical method for time-series forecasting of influenza incidence, including studies investigating meteorological drivers. It serves as the conventional benchmark against which novel methods in this specific domain are often evaluated. Therefore, demonstrating superiority over ARIMA is the most direct and scientifically relevant way to validate the incremental value of our deep learning approach.

      (2) A Focused Scientific Hypothesis: Our primary hypothesis was that the LSTM network, with its inherent ability to capture complex non-linearities and long-term dependencies, could overcome the documented limitations of linear autoregressive models like ARIMA in the context of climate-influenza dynamics. Our study was designed specifically to test this hypothesis.

      (3) Avoiding a “Bake-off” without a Clear Rationale: While other machine learning models like Random Forest or XGBoost are powerful, they are not established as the standard baseline in this particular niche of literature. Including them would shift the focus from a targeted comparison against the conventional standard to a broader, less focused “bake-off” of various algorithms. Such an exercise, while potentially interesting, would risk diluting the core message of our paper and would be undertaken without a clear, literature-driven hypothesis for why one of these specific tree-based models should be the next logical benchmark.

      Therefore, we will argue in the revised manuscript that our focused comparison with ARIMA provides the clearest and most meaningful assessment of our model’s contribution to the existing body of work on climate-informed influenza forecasting. We will, however, explicitly acknowledge in the Discussion that future work could indeed benefit from a broader comparative analysis as the field continues to evolve and adopt a wider array of machine learning techniques.

      Similarly, for some of the reviewer’s recommendations that do not require significant time and effort to implement, such as recommendation 7, we have also redrawn Figure 3 based on your feedback. It is provided for your review.

      Figure 3 presents the time series of the cases. I wonder whether the data for these factors and outcomes are daily or aggregated by week/month? I suggest representing it in 9x1 format with a single x-axis to compare, instead of 3x3 format. Authors can refer similar plot in https://doi.org/ 10.1371/journal.pcbi.1012311 in Figure 1.

      We are deeply grateful for the reviewer’s valuable suggestion and thoughtful provision of reference illustrations. Based on their input, we have redrawn Figure 3 and have included it for their review.

      Weakness (3):

      Validation of the models is not presented clearly.

      We were concerned by this comment and conducted a thorough self-assessment of our manuscript. We believe we have performed a multi-faceted validation, but we have evidently failed to present it with sufficient clarity and structure. Our validation strategy, detailed across the Methods and Results sections, includes:

      • Internal Out-of-Time Validation: Using 2023 data as a hold-out set to test the model trained on 2018-2022 data (lines 695-696, 705-710; Figure 6A, B).

      • External Validation: Testing the trained model on an independent dataset from a different city, Sanming (lines 730-736; Figure 6I, J).

      • Benchmark Model Comparison: Quantitatively comparing the LSTM’s performance against the standard ARIMA model using multiple error metrics (lines 737-745; Table 2).

      • Interpretability Validation (Sanity Check): Using SHAP analysis to ensure the model’s predictions are driven by epidemiologically plausible factors (lines 746-755; Figure 6E-H).

      To address the reviewer’s valid critique of our presentation, we will significantly restructure the relevant parts of the Results section. We will create explicit subheadings such as “Internal Validation,” “External Validation,” and “Comparative Performance against ARIMA Benchmark” to make our comprehensive validation process unambiguous and easy to follow.

      Weakness (4):

      The claim for providing tools for 'early warning' was not validated by analysis and results.

      We agree with this assessment entirely. This aligns with the eLife Assessment and comments from Reviewer #1. Our primary revision will be to systematically recalibrate the manuscript's language. We will replace all instances of “early warning tool” with more accurate and modest phrasing, such as “high-performance forecasting framework” or “a foundational model for future warning systems.” We will ensure that our revised title, abstract, and conclusions precisely reflect what our study has delivered: a robust predictive model, not a field-ready public health intervention tool.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments:

      (1) The gel quality in Figure 1 is inconsistent for different samples, with some bands not well resolved (e.g., for pUL11, GAPDH, or pUL20).

      We thank the reviewer for their suggestion. We tried to resolve the bands several times, but unfortunately this was the best outcome we could achieve.

      (2) The manuscript would benefit from a summary figure or table to concisely present the findings for each protein. It is a large body of manuscript, and a summary figure showing the discovered function would be great.

      We thank the reviewer for their suggestion. We have created a summary table (Table 2).

      (3) Figure 2 lacks clarity on the type of error bars used (range, standard error, or standard deviation). It says, however, range, and just checking if this is what the authors meant.

      We thank the reviewer for double-checking, but it is meant to be range, as reported in the legend. We used range because there are only two data points for each time point, which are insufficient to calculate standard deviation or standard error.

      (4) The manuscript could be improved by including details on how the plasma membrane boundary was estimated from the saturated gM-mCherry signal. An additional supplementary figure with the data showing the saturation used for the boundary definition would be helpful.

      We appreciate the suggestion and have included an example of how saturated gM-mCherry signal was used to delineate the cytoplasm in Supp. Fig. 4A.

      (5) Additional information or supplementary figures on the mask used to filter the YFP signal for Figure 4 would be helpful.

      Thanks, we have adapted the text in the results section to clarify: “eYFP-VP26 signal was manually inspected to determine threshold values that filtered out background and included pixels containing individual or clustered puncta that represent capsids.”

      (6) The figure legends could include information about which samples are used for comparison for significance calculations. As the colour of the brackets is different from the compared values (dUL34), it would be great to have this information in the figure legend.

      Thanks, we have adapted Fig. 4B to make the colour of the brackets match the colour used for the ΔUL34 mutant, and we have included labels next to the brackets for clarity. We have applied similar adjustments to Fig. 5D & E and Supp. Fig. 4C.

      (7) In Figure 5B, the association between YFP and mCherry signals is difficult to assess due to the abundance of mCherry signal; single-channel and combined images might improve visualization.

      Thanks, we have provided split and combined channel views in Supp. Fig. 4B to improve visualization.

      (8) In Figure 6D, staining for tubulin could help identify the cytoskeleton structures involved in the observed virus arrays.

      We thank the reviewer for their suggestion, which we think would be interesting future work to build on the current study. Given the competitive nature of access to the cryoSIM and cryoSXT, CLXT, including staining for tubulin was outside the scope of additional experiments we were able to conduct at this time.

      (9) It is unclear in Figure 6D if the microtubule-associated capsids are with the gM envelope or not, as the signal from mCherry is quite weak. It could be made clearer with the split signals to assess the presence of both viral components.

      We have provided split channels to the figure to aid with visualization.

      (10) The representation of voxel intensity in Figure 8 is somewhat confusing. Reversion of the voxel intensity representation to align brighter values with higher absorption, which would simplify interpretation.

      We thank the reviewer for this suggestion. In contrast to fluorescence microscopy where high intensities reflect signal, low intensities represent signal (absorbance of X-rays) in cryoSXT. We respectfully decided not to reverse the values, as we believe that could cause more confusion. We have instead added a black-to-white gradient bar to illustrate that low voxel intensities correspond to dark signal in Fig 8.

      (11) The visualization in panel I of Figure 8 might benefit from a more divergent colormap to better show the variation in X-ray absorbance.

      We thank the reviewer for their suggestion. We experimented with a few different colour schemes but concluded that the current one produced the clearest results and was most accessible for color-blind viewers.

      (12) Figure 9 would be enhanced by images showing the different virus sizes measured for the comparative study, which would help assess the size differences between different assembly stages.

      We thank the reviewer for their suggestion and have included images to accompany the graph.

      Overall, this is an excellent manuscript and an enjoyable read. It would be interesting to see this approach applied to the study of other viruses, providing valuable insights before progressing to high-resolution methods.

      Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      Weaknesses:

      It would be helpful to find out what role the targeted proteins play in nuclear egress or envelopment acquisition in a different orthoherpesvirus, like HSV-2. This would confirm the suitability of the technical approach set and would also act as a way to validate their mechanism at least in one additional herpesvirus beyond HSV-1. So, using the current manuscript as a starting point and for future studies, it would be advisable to focus on the protein functions of other viruses and compare them.

      We appreciate the suggestion and agree that this would be a great starting point for future studies. At present, we do not have a panel of mutant viruses in HSV-2 or another orthoherpesvirus, and it would be significant work to generate them, so we consider this outside the scope of the current study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) There are enough uncommon abbreviations in the text to justify the inclusion of an abbreviation list.

      We thank the reviewer for the suggestion, but we define all uncommon abbreviations at first mention and an abbreviations list is not part of eLife’s house style.

      (2) The complex paragraph on p. 7 would be much easier to digest if broken into smaller chunks. Consider similar treatment for other lengthy landmark-free blocks of text, e.g., the one that begins on p. 14. Subheadings would help.

      We thank the reviewer for this suggestion. We have divided large paragraphs into more easily digestible chunks throughout the manuscript, for example in the discussion where the previous monolithic 3rd paragraph has been divided into five shorter, focussed paragraphs.

      (3) Table 1 needs units.

      We thank the reviewer for noticing our omission and apologise for the oversight - the table has been updated accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Toward the end of the manuscript, I missed some lines attempting to speculate on the origin/nature of the spherical/ellipsoidal vesicles providing the envelopment. Would it be possible to incorporate this in the Discussion section?

      Thank you for noticing that omission. We have now included a few lines speculating that they may represent recycling endosomes, trans-Golgi network vesicles, or a hybrid compartment.

      (2) I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged with their own, providing a complete view of the story.

      We thank the reviewer for their kind words.

      Note to editors

      In addition to these responses to the reviewer’s comments, we have also now included in the methods section details of the Tracking of Indels by Decomposition (TIDE) analysis we performed (data in Supplementary Figure 3) that was omitted by mistake from the original submission.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all suggestions satisfactorily. 

      Reviewer #2 (Recommendations for the authors):

      The authors have adequately dealt with the comments. 

      Reviewer #3 (Recommendations for the authors):

      (1) Line 157. Although the authors have added a statement acknowledging that addition of YE increased hyphal width and secretion in A. nidulans without increasing nuclear number, they have not indicated how this result might impact their model. It might just boil down to variation between the different Aspergilli, but it merits attention. 

      (2) Line 341. To extend the argument, you might consider adding this citation (https://elifesciences.org/articles/76075), which provides evidence that nuclear size might scale with osmotic pressure based on the density of macromolecules in the nucleus vs. cytoplasm.

      Thanks for the suggestion.

      L341 This is likely related to the phenomenon in which a decrease in cell size is accompanied by a reduction in nuclear size (66).

      (3) Line 343. Neurospora crass hyphal cells can exceed 100 nuclei... 

      Changed.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." In the absence of inhibitory plasticity, the proposed mechanisms result in good performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Interestingly, adding inhibitory plasticity improves classification performance even when input features are randomly distributed.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation.

      We greatly appreciate your recognition of the study’s integrative scope and the challenges of linking detailed biophysics to high-level computation. We acknowledge that the model’s complexity can obscure the contribution of individual components. However, as stated in the introduction the principles already have been shown in simplified theoretical models for instance  in Tran-Van-Minh et al. 2015. Our aim here was to extend those ideas into a more biologically detailed setting to test whether the same principles still hold under realistic constraints. While simplification can aid intuition, we believe that demonstrating these effects in a biophysically grounded model strengthens the overall conclusion. We agree that further comparisons with reduced models would be valuable for isolating the contribution of specific components and plan to explore that in future work.  

      Reviewer #2 (Public review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Thank you for highlighting the biological plausibility of our calcium- and dopamine-dependent learning rule and its ability to exploit dendritic nonlinearities. Your positive assessment reinforces our commitment to refining the rule and exploring its implications in larger, more diverse settings.

      Reviewer #1 (Recommendations for the authors):

      Major recommendations:

      P9: When introducing the excitatory learning rule, the reader is referred to the Methods. I suggest moving Figure 7A-D, "Excitatory plasticity" to be more prominently presented in the main body of the paper where the reader needs to understand it. There are errors in the current Figure 7, and wrong/confusing acronyms. The abbreviations "LTP-K" and "MP-K" are not intuitive. In A, I would spell out "LTP kernel" and "Theta_LTP adaptation".  In B, I would spell out "LTD kernel" and "Theta_LTD adaptation".

      We have clarified the terminology in Figure 7 by replacing “LTP-K” with “LTP kernel” and “MP-K” with “metaplasticity kernel”.  While we kept Figure 7 in the Methods section to maintain the flow of the main text, we agree that an earlier introduction of the learning rule improves clarity. To that end, we added a simplified schematic to Figure 3 in the Results section, which provides readers with an accessible overview of the excitatory plasticity mechanism at the point where it is first introduced.

      In C, for simplicity and clarity, I would only show the initial and updated LTP kernel and Calcium and remove the Theta_LTP adaptation curve, it's too busy and not necessary. Similarly in D, I would show only the initial and updated LTD kernel and Calcium and remove the Theta_LTD adaptation curve. In the current version of the Figure, panel B, right incorrectly labels "Theta_LTD" as "Theta_LTP". Panel D incorrectly labels "LTD kernel" as "LTP/MP-K" in the subheading and "MP/LTP-K" in the graph.

      To avoid confusion and better illustrate the interactions between calcium signals, kernels, and thresholds, we have added a movie showing how these components evolve during learning. The figure panels remain as originally designed, since the LTP kernel governs both potentiation and depression through metaplastic threshold adaptation, while the LTD kernel remains fixed.

      P17: Again, instead of pointing the reader to the Methods, I would move Figure 7E, "Inhibitory plasticity" to the main body of the paper where the reader needs to understand it. For clarity, I would label "C_TL" and "Theta_Inh,low" and "C_TH" as "Theta_Inh,high". The right panel could be better labeled "Inhibitory plasticity kernel". The left panel could be better labeled "Theta_Inh adaptation", with again replacing the acronyms "C_TL" and "C_TH". The same applies to Fig. 5D on P19.

      We have updated the labeling in Figures 5D and 7E for clarity, including replacing "C_TL" and "C_TH" with "Theta_Inh,low" and "Theta_Inh,high". In addition, we added a simplified schematic of the inhibitory plasticity rule to Figure 5 to assist the reader’s understanding when presenting the results. Figure 7E remains in the Methods section to preserve the flow of the main text.

      P12: I would suggest simplifying Fig. 3 panels and acronyms as well. Remove "MP-K" from C and D. Relabel "LTP-K" as "LTP kernel". The same applies to Fig. 5E on P19 and Fig. 3 - supplement 1 on P46 and Fig 6 - supplement 1 on P49.

      We have simplified the labeling across all relevant figures by replacing “MP-K” with “metaplasticity kernel” and “LTP-K” with “LTP kernel.” To maintain clarity, we retained these terms in only one panel as a reference.

      Minor recommendations:

      P4: "Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights." This sentence is jarring. BCM and metaplasticity has been discussed in hundreds of theory papers! Cite some. This sentence would more accurately read, "Our study corroborates prior theory work (citations) demonstrating that metaplasticity helps to achieve stable and physiologically realistic synaptic weights."

      We have followed the reviewers suggestion and updated the sentence to: Previous theoretical studies (Bienenstock et al., 1982; Fusi et al., 2005; Clopath et al., 2010; Benna & Fusi, 2016; Zenke & Gerstner, 2017) demonstrate the essential role of metaplasticity in maintaining stability in synaptic weight distributions. (page 2 line 49-51, page 3 line 1)

      P9: Grammar. "The neuron model was during training activated..." should read "During training, the neuron model was activated..."

      Corrected

      P17: Lovett-Barron et al., 2012 is appropriately cited here. Milstein et al., Neuron, 2015 also showed dendritic inhibition regulates plateau potentials in CA1 pyramidal cells in vitro, and Grienberger et al., Nat. Neurosci., 2017 showed it in vivo.

      P19 vs P16 vs P21. Fig. 4B, Fig. 5B, and Fig. 6B choose different strategies to show variance across seeds. Please choose one strategy and apply to all comparable plots.

      We thank the reviewer for these helpful points.

      We have added the suggested citations (Milstein et al., 2015; Grienberger et al., 2017) alongside Lovett-Barron et al., 2012. 

      Variance across seeds is now displayed uniformly (mean is solid line STD is shaded area) in Figures 4B, 5B, and 6B.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      (1)  Quality of Scientific Writing:

      i. Mathematical and Implementation Details:

      I appreciate the authors' efforts in clarifying the mathematical details and providing pseudocode for the learning rule, significantly improving readability and reproducibility. The reference to existing models via GitHub and ModelDB repositories is acceptable. However, I suggest enhancing the presentation quality of equations within the Methods section-currently, they are low-resolution images. Please consider rewriting these equations using LaTeX or replacing them with high-resolution images to further improve clarity.

      We appreciate the reviewer’s comment regarding clarity and reproducibility. In response, we have rewritten all equations in LaTeX to improve their readability and presentation quality in the Methods section.

      ii. Figure quality.

      I acknowledge the authors' effort to improve figure clarity and consistency throughout the manuscript. However, I notice that the x-axis label "[Ca]_v (μm)" in Fig. 7E still appears compressed and unclear. Additionally, given the complexity and abundance of hyperparameters or artificial settings involved in your experimental design and learning rule (such as kernel parameters, metaplasticity kernels, and unspecific features), the current arrangement of subfigures (particularly Fig. 3C, D and Fig. 5D, E) still poses readability challenges. I recommend reordering subfigures to present primary results (e.g., performance outcomes) prominently upfront, while relegating visualizations of detailed hyperparameter manipulations or feature weight variations to later sections or the discussion, thus enhancing clarity for readers.

      We thank the reviewer for pointing out the readability issue. We have corrected the x-axis label in Figure 7D. We hope this new layout with a simplified rule in Fig 3 and Fig 5   presents the key findings while retaining full mechanistic detail to make it easier to understand the model behavior.  

      iii. Writing clarity.

      The authors have streamlined the "Metaplasticity" section and reduced references to dopamine, which is a positive step. However, the broader issue remains: the manuscript still appears overly detailed and more like a technical report of a novel learning rule, rather than a clearly structured scientific paper. I strongly recommend that the authors further distill the manuscript by clearly focusing on one or two central scientific questions or hypotheses-for instance, emphasizing core insights such as "inhibitory inputs facilitate nonlinear dendritic computations" or "distal dendritic inputs significantly contribute to nonlinear integration." Clarifying and highlighting these primary scientific questions early and consistently throughout the manuscript would substantially enhance readability and impact.

      We appreciate the reviewer’s guidance on improving the manuscript’s clarity and focus.In response, we now highlight two central questions at the end of the Introduction and have retitled the main Results subsections to follow this thread, thereby sharpening the manuscript’s focus while retaining necessary technical detail (page3 line 20-28).We have also removed redundant passages and simplified technical details to improve overall readability .

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The authors acknowledge that their simulated [Ca²⁺] levels exceed typical biological measurements but claim that the learning rule remains robust across variations in calcium concentrations. However, robustness to calcium variations was not explicitly demonstrated in the main figures. To convincingly address this concern, I recommend the authors explicitly test and present whether adopting biologically realistic calcium concentrations (~1 μM) impacts the learning outcomes or synaptic weight dynamics. Clarifying this point with a supplemental analysis or an additional figure panel would significantly strengthen their argument regarding the model's biological plausibility and robustness.

      We thank the reviewer for the comment. The elevated [Ca<sup>²⁺</sup>]<sub>NMDA</sub> values reflect localized transients in spine heads with narrow necks and high NMDA conductance. These values are not problematic for our model, as the plasticity rule depends on relative calcium differences rather than absolute levels as the metaplasticity kernel will adjust. In future versions of our detailed neuron model, we will likely decrease the spine axial resistance of the spine neck.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

      We thank the reviewer for this clarification. We did not intend to imply that voltage-dependence modulation is universally incapable of supporting bursting or that conductance changes alone are universally sufficient. To avoid any overstatement, we now write:

      “…activity-dependent changes in channel voltage-dependence alone did not assemble bursting from these low-conductance initial states (cf. Figure 1B)”.

      Reviewer #2 (Public review):

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      We appreciate the reviewer’s interest in a quantitative partitioning of the contributions from voltage-dependence regulation versus conductance regulation. We agree that this would be an important analysis in principle. In practice, obtaining this would be difficult.

      Our goal here was to establish the principle: that half-(in)activation shifts can meaningfully influence recovery. This is not an obvious result, given that these two processes can act on vastly different timescales.

      That said, our current dataset does provide partial quantitative insight. Eight of the twenty models required some form of voltage-dependence modulation to recover; among these, two only recovered under fast modulation and two only under slow modulation. This demonstrates that voltage-dependence regulation is essential for recovery in some neurons, and its timescale critically shapes the outcome.

      (2) Another related question is whether the speed of recovery is significantly modified by implemeting voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      Our current results suggest that voltage-dependence regulation can indeed accelerate recovery, as illustrated in Figure 3 and supported by additional simulations (not shown). However, a fully quantitative comparison (e.g., time-to-recovery distributions or survival analysis) would require a much larger ensemble of degenerate models to achieve sufficient statistical power across all four conditions. Generating and simulating this expanded model set is computationally intensive, requiring stochastic searches in a high-dimensional parameter space, full time-course simulations, and a subsequent selection process that may succeed or fail.

      The principal aim of the present study is conceptual: to demonstrate that this multi-timescale homeostatic model—built here for the first time—can capture interactions between conductance regulation and voltage-dependence modulation during assembly (“neurodevelopment”) and perturbation. Establishing the conceptual framework and exploring its qualitative behavior were the necessary first steps before pursuing a large-scale quantitative study.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

      We appreciate the reviewer’s interest in a more quantitative characterization of the interaction between voltage-dependence and conductance regulation (Fig. 6). As noted in our responses to Comments 1 and 2, some of the facets of this interaction—such as the ability to recover from perturbations and the speed of assembly—can be measured.

      However, fully quantifying the landscape sketched in Figure 6 would require systematically mapping the regions of high-dimensional parameter space where stable solutions exist. In our model, this space spans 18 dimensions (maximal conductances and half‑(in)activations). Even a coarse grid with three samples per dimension would entail over 100 million simulations, which is computationally prohibitive and would still collapse to a schematic representation for visualization.

      For this reason, we chose to present Figure 6 as a conceptual summary, illustrating the qualitative organization of solutions and the role of multi-timescale regulation, rather than attempting an exhaustive mapping. We view this figure as a necessary first step toward guiding future, more quantitative analyses.

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

      We thank the reviewer for this thoughtful comment and for highlighting an important point about the scope of our conclusions regarding timescale effects. The reviewer is correct that our simulations demonstrate the influence of voltage-dependence timescale primarily during periods of adaptation—when the neuron is moving from an initial, target-mismatched state toward a final target-satisfying state. Once the system has reached a stable solution, simply changing the timescale of voltage-dependent modulation does not by itself shift the neuron’s activity, unless a new perturbation occurs that re-engages the homeostatic mechanism. We have clarified this point in the revised Discussion.

      The confusion likely arose from imprecise phrasing in the original text describing Figure 6. Previously, we wrote:

      “When channel gating properties are altered quickly in response to deviations from the target activity, the resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”. 

      We have revised this sentence to emphasize that the orange bubble represents the eventual stable state, rather than implying that timescale changes alone drive activity shifts:

      ”When channel gating properties are altered quickly in response to deviations from the target activity, the neuron ultimately settles into a stable activity pattern. The resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”.

      Reviewer #1 (Recommendations for the authors):

      Unless I am missing something, Figure 2 should be a supplement to Figure 1. I would prefer to see panel B in Figure 1 to indicate that the findings of that figure are general. Panel A really is not showing anything useful to the reader.

      We appreciate the suggestion to combine Figure 2 with Figure 1, but we believe keeping Figure 2 separate better preserves the manuscript’s flow. Figure 1 illustrates the mechanism in a single model, while Figure 2 presents the population-level summary that generalizes the phenomenon across all models.

      Also, I find Figure 6 unnecessary and its description in the Discussion more detracting than useful. Even with the descriptions, I find nothing in the figure itself that clarifies the concept.

      We appreciate the reviewer’s feedback on Figure 6. The purpose of this figure is to conceptually illustrate that multiple degenerate solutions can satisfy the calcium target and that the timescale of voltage‑dependence modulation can influence which region of this solution space is accessed during the acquisition of the activity target. Reviewer 3 noted some confusion about this point. We made a small clarifying edit.

      At the risk of being really picky, I also don't see the purpose of Figure 7. And I find it strange to plot -Vm just because that's the argument of findpeaks.

      We appreciate the reviewer’s comment on Figure 7. The purpose of this figure is to illustrate exactly what the findpeaks function is detecting, as indicated by the red arrows on the traces. For readers unfamiliar with findpeaks, it may not be obvious how the algorithm interprets the waveform. Showing the peaks directly ensures that the measurements used in our analysis align with what one would intuitively expect.

      Reviewer #2 (Recommendations for the authors):

      The writing of the article has been much improved since the last version. It is much clearer, and the discussion has been improved and better addresses the biological foundations and relevance of the study. However, conclusions are rather qualitative, while one would expect some quantitative answers to be provided by the modeling approach.

      We appreciate the reviewer’s concern regarding quantification and share this perspective. As noted above, our study is primarily conceptual. Many aspects of the model, such as calcium handling and channel regulation, are parameterized based on incomplete biological data. These uncertainties make robust quantitative predictions difficult, so we focus on qualitative outcomes that are likely to hold independently of specific parameter choices.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers once again for their careful evaluation of the revised manuscript and for their constructive suggestions. In response to the remaining recommendations, we have made minor amendments to the manuscript. The main changes are as follows:

      • Metabolite Concentrations: we now report them more conventionally, i.e. normalised by water content. The original normalisation by the absolute MM content has been retained in the supplementary information, as MMs are an endogenous tissue probe (i.e., not dependent on cerebrospinal fluid).  The fact that both water and MM normalisation provide similar trends supports the robustness of our conclusions. We have also updated Figure S2 to include the absolute MM concentrations, raw water content, and the MM-to-water ratios for each time point.

      • Taurine Interpretation: We have revised the wording related to the interpretation of taurine findings to clarify that we present a set of converging observations suggesting taurine may serve as a marker of early cerebellar neurodevelopment, rather than asserting it as a definitive conclusion.

      Comments to the editor & reviewers:

      We sincerely thank the reviewers and the editor for their valuable feedback, which has significantly improved the manuscript since its initial submission.

      Please note a correction in Figure S2 (added during the previous revision round): the reported evolution of metabolite/water concentrations has changed due to an earlier error in calculating the water peak integral, which has now been corrected.

      While we recognise that a study and manuscript can always be improved, we prefer not to make further changes at this stage. We cannot conduct new experiments, and redesigning the model falls outside the scope of this work. Additionally, we believe that further altering the manuscript’s structure could lead to unnecessary confusion rather than clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Methodological biases in annotation and sequencing methods

      We acknowledge the reviewer’s concern regarding methodological heterogeneity in genome annotations, particularly regarding the use of CDS annotations derived from public databases. In response, we have properly addressed the potential sources of bias in estimating alternative splicing (AS) across such a broad taxonomic range.

      Given the methodological challenges encountered in this study, we have undertaken an in-depth analysis of the biases associated with genome annotations and their impact on large-scale estimates of alternative splicing. This effort has resulted in the development of a comprehensive framework for quantifying, modeling, and correcting such biases, which we believe will be of interest to the broader genomics community. We are currently preparing a separate manuscript dedicated to this methodological aspect, which we intend to submit for publication in the near future.

      To account for these biases, we performed a statistical evaluation of annotation quality by examining the relationship between ASR values and multiple features of the NCBI annotation pipeline, including both technical and biological variables. Specifically, we analyzed a set of metadata descriptors related to: (i) genome assembly quality (e.g., Contig N50, Scaffold N50, number of gaps, gap length, contig/scaffold count), (ii) the amount and diversity of experimental evidence used in annotation (e.g., number of RNA-Seq reads, number of tissues, number of experimental runs, number of proteins and transcripts, including those derived from Homo sapiens), and (iii) the nature of the annotated coding sequences (e.g., total number of CDSs, percentage of CDSs supported by experimental evidence, proportion of known CDSs, percentage of CDSs derived from ab initio predictions).

      This comprehensive analysis revealed that the strongest bias affecting ASR values is associated with the proportion of fully supported CDSs, which showed a strong positive correlation with observed splicing levels. In contrast, the percentage of CDSs relying on ab initio models showed a negative correlation, indicating that computational predictions tend to underestimate splicing complexity. Based on these findings, we implemented a polynomial normalization model using the percentage of fully supported CDSs as the main predictor of annotation bias. The resulting normalized metric, ASR<sup>∗</sup>, corrects for annotation-related variability while preserving biologically meaningful variation.

      We further verified the robustness of this correction by comparing the main results of our study using both the raw ASR and the normalized ASR<sup>*</sup> across all analyses. The qualitative and quantitative consistency of results obtained with both metrics demonstrates that our findings are not an artifact of methodological bias and validates the reliability of our approach.

      Conceptual and Statistical Framework

      Our aim was not to investigate specific regulatory mechanisms of alternative splicing, but rather to explore large-scale statistical patterns across the tree of life using a newly defined metric—the Alternative Splicing Ratio (ASR)—that enables genome-wide comparisons of splicing complexity across species. To clarify the conceptual framework, we have revised the manuscript to explicitly state our assumptions, objectives, and the scope of our conclusions. The ASR metric is now briefly introduced in the Results section, with a more detailed mathematical formulation included in the Methods section.

      From a methodological standpoint, we have expanded the manuscript to better support the comparative framework through additional statistical analyses. In particular, we now include:

      • Monte Carlo permutation tests to assess pairwise differences in splicing and genomic variables across taxonomic groups, which are robust to non-normality and heteroscedasticity in the data.

      • Welch’s ANOVA with Bonferroni correction, which accounts for unequal variances when comparing group means.

      • Phylogenetic Generalized Least Squares (PGLS) regression, which explicitly models phylogenetic non-independence between species and allows us to infer lineage-specific associations between genomic composition and alternative splicing.

      • Coefficient of variation analysis, used to evaluate the relative variability of splicing and genomic traits across groups in a scale-independent manner.

      • Variability ratio metrics, designed to compare the dispersion of splicing values relative to genomic features, thereby quantifying trends in regulatory plasticity versus structural constraints.

      All methods are thoroughly described in the revised Methods section, and their application is presented in the Results section.

      Functional vs. non-functional nature of AS events

      We have included a new discussion paragraph addressing the ongoing debate regarding the functionality of alternative splicing and a possible non-adaptive explanation for the patterns observed. While many previous studies suggest that a considerable fraction of AS events might represent splicing noise or non-functional isoforms, our intention is not to adopt this view uncritically. Instead, we cite recent literature to provide a more nuanced interpretation, recognizing both the potential adaptive value and the uncertainty surrounding the functional relevance of many AS events. Thus, rather than assuming that all observed alternative splicing events are adaptive or biologically meaningful, we now emphasize that many patterns may emerge from other processes, such as those associated to genomic constraints.

      Terminology and Result Interpretation

      The manuscript has been thoroughly revised to improve both the scientific language and the conceptual framing. We have removed inappropriate terminology such as “higher/lower organisms” and “highly evolved”. Also, we have reinterpreted the results. As part of this process, the manuscript has been substantially rewritten to focus on the most meaningful findings. Ultimately, we have retained only those results that specifically concern broad-scale patterns of alternative splicing across taxa, which are now presented with greater clarity and methodological rigor.

      Reviewer #2

      Gene Regulatory Complexity Beyond Splicing Mechanisms

      While alternative splicing represents a prominent mechanism of transcriptomic diversification, we agree with the reviewer that it constitutes only one component of the broader landscape of gene regulation. Structural and behavioral complexity in organisms arises from a combination of regulatory processes, and our study focuses specifically on alternative splicing as a measurable proxy within this multifactorial system. To clarify this point, we have added a paragraph in the Discussion section, where we explicitly contextualize alternative splicing within the wider regulatory architecture. In that paragraph, we discuss additional mechanisms that contribute to phenotypic complexity—such as transcriptional control, chromatin remodeling, epigenetic modifications, and RNA editing—citing key literature.

      Alternative Splicing Measure and Methodology

      While we agree that alternative splicing is not a definitive measure of organismal complexity, we argue that it remains a meaningful proxy for transcriptomic and regulatory diversification, especially when analyzed at large phylogenetic scale. In this version of the manuscript, our goal was not to equate alternative splicing with biological complexity, but rather to quantify its patterns across lineages and evaluate its relationship with genome structure. This point is now explicitly stated in both the Introduction and Discussion.

      We also recognize the limitations associated with the use of coding sequence (CDS) annotations from public databases such as NCBI RefSeq. To address this concern, we have conducted a detailed analysis of the potential biases introduced by heterogeneous annotation quality, sequencing depth, and computational prediction, as previously addressed in our response to Reviewer #1.

      In response to concerns about unsupported statements, we have completely rewritten the manuscript to ensure that all claims are now explicitly supported by data and grounded in up-to-date scientific literature. We have reformulated speculative statements, removed inappropriate generalizations, and improved the logical flow of the arguments throughout the text. In summary, we have strengthened both the conceptual framework and the methodological foundation of the study, while maintaining a cautious interpretation of the results.

      Trends of Alternative Splicing

      To address the reviewer’s concern, we have revised the interpretation of trends as used in our analysis. In this study, we define a trend not as a strict directional progression or a linear trajectory across all species, but rather as a broad statistical pattern observable in the relative distribution and variability of alternative splicing across major taxonomic groups. We do not claim that this pattern reflects a universal adaptive pathway. Instead, we interpret it as a signal of differences in regulatory strategies associated to the genome architecture. To avoid misinterpretation, we have rephrased several sentences in the manuscript and explicitly emphasized the variability within groups, and the lack of significant correlations in certain clades.

      Inconsistent statistics

      The discrepancies pointed out were due to differences between mean and median-based analyses. These have been clarified and consistently reported in the revised manuscript. Error bars, p-values, and a supplementary table summarizing all tests are now included. Furthremore, we have no removed any species from our dataset.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      Thank you for this thoughtful question. We conditioned on academic rank in all regression analyses to account for structural differences in career stage that may potentially influence submission behaviors. Academic rank (e.g., assistant, associate, full professor) is a key determinant of publishing capacity and strategic considerations, such as perceived likelihood of success at elite journals, tolerance for risk, and institutional expectations for publication venues.

      Importantly, academic rank is also correlated with gender due to cumulative career disadvantages that contribute to underrepresentation of women at more senior levels. Failing to adjust for rank would conflate gender effects with differences attributable to career stage. By including rank as a covariate, we aim to isolate gender-associated patterns in submission behavior within comparable career stages, thereby producing a more precise estimate of the gender effect.

      Regarding explanatory power, academic rank does indeed contribute significantly to model fit across our analyses, indicating that it captures meaningful variation in submission behavior. However, even after adjusting for rank, we continue to observe significant gender differences in submission patterns in several disciplines. This suggests that while academic rank explains part of the variation, it does not fully account for the gender gap—highlighting the importance of examining other structural and behavioral factors that shape the publication trajectory.

      Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

      Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Colour schemes of the Figures are not adjusted for colour-blindness (red-green is a big NO), some suggestions can be found here https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf

      We appreciate the suggestion. We’ve adjusted the colors in the manuscript to be color-blind friendly using one of the colorblind safe palettes suggested by the reviewer.

      (2) I do not think that the authors have fully addressed the comment about APCs and the decision to submit, given that PNAS has publication charges that amount to double of someone's monthly salary. I would add a sentence or two to explain that publication charges should not be a factor for Nature and Science, but might be for PNAS.

      While APCs are definitely a factor affecting researchers’ submission behavior, it is mostly does so for lower prestige journals rather than for the three elite journals analyzed here. As mentioned in the previous round of revisions, Nature and Science have subscription options. And PNAS authors without funding have access to waivers: https://www.pnas.org/author-center/publication-charges

      (3) Line 268, the first suggestion here is not something that would likely work. Thus, I would not put it as the first suggestion.

      We made the suggested change.

      (4) Data availability - remove AND in 'Aggregated and de-identified data' because it sounds like both are shared. Suggest writing: 'Aggregated, de-identified data..'. I still suggest sharing data/code in a trusted repository (e.g. Dryad, ZENODO...) rather than on GitHub, as per the current recommendation on the best practices for data sharing.

      Thank you for your comment regarding data availability. Due to IRB restrictions and the conditions of our ethics approval, we are not permitted to share the survey data used in this study. However, to support transparency and reproducibility, we have made all analysis code available on Zenodo at https://doi.org/10.5281/zenodo.16327580. In addition, we have included a synthetic dataset with the same structure as the original survey data but containing randomly generated values. This allows others to understand the data structure and replicate our analysis pipeline without compromising participant confidentiality.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub>, as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub>  and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.”

      Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      • Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      • Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      • Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      • Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      • Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections – see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature. 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me. 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight. 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545. 

      If so, what is the difference between phi_target and phi_tx in the model equations? 

      𝝓<sub>𝒕𝒂𝒓𝒈𝒆𝒕</sub> represents the angle between the bat and the reflected object (target).

      𝝓<sub>𝑻𝒙</sub> the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      𝝓<sub>𝑻𝒙𝑹𝒙</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      𝝓<sub>𝑹𝒙𝑻𝒙</sub> represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)? 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both? 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials. 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis) 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation? 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect? 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation. 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming. 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.  

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem? 

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach. 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      • Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      • Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem. 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion. 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3m². The only statistically significant impact was at the highest density (40 bats/3m²), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer— measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?  

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species? 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal bat’s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channel’ stands for ‘Filter Bank Channel’. This clarification has been added to the caption of Figure 1. 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 × 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 × 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of –23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining –32dB. 

      Moreover, a sensitivity analysis across a wide range (–49 to –23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article. 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewer’s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variations—whether due to call structure or variations resulting from overlapping echoes from nearby reflectors—are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoes—such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewer’s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecifics—provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the “Effect Size” column for each tested parameter. These values describe the magnitude of each parameter’s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., “exit probability increased from 20% to 87%,” or “with a decrease of 3.5%±8% to 5.5%±5% (mean ± s.e.)”), which we believe improves interpretability and scientific relevance.  

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term ‘t’ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1) Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2–4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style – and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2) The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3) Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correct—the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4) The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5) Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6) I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8) The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correct—this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9) D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>₍ₜₓ</sub>r<sub>ₓ</sub> – the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitter’s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90° or 180° turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the bat’s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategies—such as signal redundancy and basic pathfinding—are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version.  

      (2) Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3) The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selected—Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)—span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure. 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27 m) is still smaller than those reported for Tadarida brasiliensis during dense emergences—further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behavior—making the framework adaptable to additional species, including TB.

      (4) Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The complete  is detailed in  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5) The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the bats—including dynamic call adjustments— was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the bats’ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6) Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detections—those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7) Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8) Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the bat’s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9) Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidths—especially in the lower-frequency channels—strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10) Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across calls—enabled by frequent echolocation—appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11) Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500–507), we run the detection analysis twice—once with only desired echoes (“interference-free detection”) and once including masking signals (“full detection”). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 µs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12) Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occur—hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoes—including those originating from masking signals or conspecific calls—are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13) Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term “desired conspecific's echoes” refers to echoes originating from the bat’s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14) Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiver—including such as thresholding at multiple stages—can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16) Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics.  

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      “0 calls” refers to the case where only the most recent call is used for pathfinding—without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. We’ve clarified this terminology in the figure caption.

      (19) Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/m². We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecifics—an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20) Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoes—particularly those reflected from nearby obstacles (e.g., 1 m away)—were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21) Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22) Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signals—such as those emitted by PK bats— distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly – lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewer’s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 m²), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25) Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarría, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26) Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the “aggregation model” refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoes—an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29) Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core components—such as the acoustic calculations, receiver model, and echolocation behavior—remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30) Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31) Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32) Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33) Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34) Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35) Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36) Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10–80 kHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37) Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4 milliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997),  ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on bats’ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.  

      (38) Line 452: Citation for the forward and backward masking durations?

      See the  to the previous comment.

      (39) Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40) Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the bat’s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41) Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42) Line 69: "band width" should be one word.

      Thank you, we have corrected it to “bandwidth”.

      (43) Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44) Line 128: "convoluted" don't you mean "convolved"?

      We have replaced “convoluted” with the correct term “convolved” in the revised text.

      (45) Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) ‘Modeling active sensing reveals echo detection even in large groups of bats’, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662–26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) ‘Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)’.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) ‘It’s not black or white-on the range of vision and echolocation in echolocating bats’, Frontiers in Physiology, 4 SEP(September), pp. 1–12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) ‘Onboard recordings reveal how bats maneuver under severe acoustic interference’, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469– 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) ‘Target-detection by the echolocating bat, Eptesicus fuscus’, Journal of Comparative Physiology □ A, 145(4), pp. 431–435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) ‘Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscus’, pp. 119–124.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648– 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) ‘Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ’, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001•academic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) ‘Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscus’. Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight’, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491– 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Author Response:

      We thank the reviewers and editors for their thoughtful and constructive feedback on our manuscript, “Morphology and ultrastructure of pharyngeal sense organs of Drosophila larvae.” We are pleased that both reviewers found our ultrastructural analysis and 3D reconstructions of the larval pharyngeal sensory system to be of high quality, and we appreciate the recognition of the study’s significance and potential impact on the Drosophila neurobiology field.

      We want to address the concern raised regarding the limited referencing and comparison with previous work on pharyngeal sensory organs, particularly in adult Drosophila and other insect species.

      As noted by the reviewers, our manuscript is concise and focused. We want to clarify that we initially prepared and submitted this study with the intention of it being considered as a Short Report, which comes with limitations on the number of characters and figures that can be included. During the submission process, we were asked by the editors if we would like to submit our work as a full-length Research Advance, which we agreed to.

      That said, we are now happy to expand the discussion in the broader context of related studies — including prior EM and anatomical work — which would enrich the manuscript and provide readers with a deeper comparative perspective.

      We are grateful for the positive assessment of our manuscript and for the opportunity to clarify this point.

      Sincerely,

      Vincent Richter and Andreas S. Thum

    1. Author response:

      We thank the editors and the reviewers for their constructive comments, which have greatly helped us identify key areas to strengthen the manuscript. We acknowledge the validity of the major points raised, and we plan the following revisions:

      Criticality

      As suggested by Reviewer #1, we will carefully examine whether the dynamics we observe are indeed poised near criticality. We will perform additional analyses to assess how structural and dynamic features change when parameters are tuned away from the coil–globule transition, and we will revise the title and text to ensure that our claims are appropriately moderated.

      Role of the homie element

      We agree with Reviewer #2 that the presence of homie elements introduces major modifications to chromosome structure and dynamics. We initially considered that this factor might even explain the paradox described in Gregor’s work. In the first phase of our study, we carried out simulations including homie elements and found that the potential confounding effects are largely resolved if we restrict the analysis to trajectories prior to encounters between the two homie copies. We will include these simulations and expand the discussion accordingly in the revised version.

      Comparison to Hi-C data

      Both reviewers noted a visual discrepancy between experimental and simulated Hi-C maps. We will address this by testing alternative similarity measures (e.g., Pearson correlation, as suggested) and by exploring parameter ranges that may improve the agreement.<br /> Together, these modifications will strengthen the manuscript, clarify the scope of our conclusions, and directly address the reviewers’ central concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this investigation Kapustin et al. demonstrate that vascular smooth muscle cells (VSMCs) exposed to the extracellular matrix fibronectin stimulates the release of small extracellular vesicles (sEVs). The authors provide experimental evidence that stimulation of the actin cytoskeleton boosts sEV secretion and posit that sEVs harbor both fibronectin and collagen IV protein themselves which also, in turn, alter cell migration parameters. It is well established that fibronectin is associated with increased cell migration and adherence; therefore, this association with VSMCs is not novel.

      The reviewer is correct that FN has been associated with migration and adherence in previous studies.  However we have extended these observations to show that the extracellular fibronectin matrix stimulates small extracellular vesicle (sEVs) secretion by modulating the actin cytoskeleton. We also showed that sEVs are trapped in the extracellular matrix and that by presenting collagen VI induce early focal adhesion formation, reduce excessive cellular spreading and guide cell invasion directionality though a 3D matrix. Hence, sEVs mediate cell-matrix cross talk and change cell behaviour in the context of fibronectin matrix. This is critically important for vasculature where regulated VSMC invasion is essential for repair with its deregulation leading to pathology.

      The authors purport that sEV are largely born of filopodia origin; however, this data is not well executed and seems generally at odds with the presented data.

      Our experimental data showed that CD63 MVs are associated with filopodia in fixed and live cells (Fig 2E, 2F and Video S1) and that inhibition of filopodia formation using the formin inhibitor, SMIFH2 reduced sEV secretion on FN (Fig 2B). However, we agree with the reviewer that further studies are required to connect sEV secretion to filopodia.  To address this we have provided further data analysis but also toned down our conclusions regarding this point: . Changes include:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3) Results, page 6 Line 27-44 and conclusion page 7, Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (4) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Similarly, the effect of sEVs on parameters of cell migration has almost no magnitude of effect, making mechanism exploration somewhat nebulous.

      VSMC are mesenchymal-type cells with a low migration rate and we agree that the changes in the motility are not of great magnitude even for the positive controls suggesting that this is a complex, multifactorial process for VSMCs. In our experiments we collected data from >5000 individual cells to measure the average speed and found that fibronectin matrix on its own increased VSMC speed from ~0.61 um/min to ~0.68 μm/min (~12% raise) which was statistically significant (Fig 5A). Addition of a sEV inhibitor caused a modest but significant decrease in cellular speed. Interestingly, addition of ECM-associated sEVs did not influence cell speed in 2D or 3D assays. However in a 3D model we observed a 22% change in cell directionality (Fig 5G) and  a 235% change in cell alignment index (FMI, Fig 5H) which we believe is very strong evidence that VSMC-derived sEVs are involved in a regulation of VSMC invasion directionality.  These data are also in agreement with sEV effects in tumour cells (Sung et al., 2015) though this previous study did not identify the factor driving the directionality and we think our Collagen VI data extends significantly these previous observations. 

      Results, page 9: “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.  

      Lastly, the proposed mechanism of VSMCs responding to, and depositing, ECM proteins via sEVs was not rigorously executed; again, making the conclusions challenging for the reader to interpret.

      We appreciate the reviewer’s comment regarding the mechanistic aspects of VSMCs responding to and depositing ECM proteins via sEVs. In our revised manuscript, we have expanded the data demonstrating that sEVs can be retained within the extracellular matrix (see Figs 3A, 3B, S3A, S3B). Additionally, we show that collagen VI is present on the surface of sEVs, where it may modulate cell adhesion and influence the directionality of cell invasion (Fig 7E). Our results further indicate that both fibronectin (FN) and collagen VI can be recycled through multivesicular bodies (see Figs S3C, S3D, S3E–S3G). However, we acknowledge that the precise mechanisms governing the selective loading of ECM proteins onto sEVs, as well as the specific contributions of sEVs to overall ECM organization, remain to be fully elucidated and warrant further investigation. Based on our current evidence, we propose that collagen VI–loaded sEVs act primarily in a signaling capacity by modulating focal adhesion formation but are not directly involved in ECM structural remodeling.

      Results, page 7: To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) [42]. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      Results, page 7: We previously found that the serum protein prothrombin binds to the sEV surface both in the media and MVB lumen showing it is recycled in sEVs and catalyses thrombogenesis being on the sEV surface43. So we investigated whether FN can also be associated with sEV surface where it can be directly involved in sEV-cell cross-talk43.   We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reducd likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H). Finally, FN could be co-purified with sEVs from VSMC conditioned media (Fig S3I) and detected on the surface of sEVs by flow cytometry confirming its loading and secretion via sEVs (Fig 3C).

      Results: page 10  Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG2[53] and suppression of cell spreading on FN[54]. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E). 

      Discussion page 12. “In fact, we observed that an extensive secretion of sEVs effectively ceased protrusion activity; also VSMCs acquired a rounded morphology when “hovering” over the FN matrix decorated with sEVs (data not shown). Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      Discussion, page 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     

      Strengths

      The authors provide a comprehensive battery of cytoskeletal experiments to test how fibronectin and sEVs impact both sEV release and vascular smooth muscle cell migratory activation.

      We appreciate this comment reflecting our efforts to apply a range of orthogonal methods to show the role of the integrin/actin cytoskeleton in ECM-stimulated sEV secretion.

      Weaknesses

      Unfortunately, this article suffers from many weaknesses. First, the rigor of the experimental approach is low, which calls into question the merit of the conclusions. In this vein, there is a lack of proper controls or inclusion of experiments addressing alternative explanations for the phenotype or lack thereof.

      We acknowledge this comment and agree that there was not sufficient evidence to conclude that sEV secretion occurs via filopodia despite the microscopy/inhibitory data so this claim has now been excluded from the study. However we believe that our experimental data does clearly show that FN stimulates the secretion of collagenVI-loaded sEVs which are trapped by the ECM and have the capacity to modulate VSMC adhesion and invasion directionality. To support this, we have now extended the dataset in the revised version:

      (1) In addition to the use of inhibitors and live cell analysis we have added quantitative data confirming that a large proportion of CD63+ endosomes are associated with F-actin/cortactin tails and this colocalization is increased upon the inhibition of sEV secretion with 3-OMS (Fig  2D, Fig S2B).

      (2) We developed a method to extract ECM-associated sEVs and quantified/characterized these using ExoView Assays further confirming significant sEV entrapment by the ECM (Figs 3B, S3A, S3B).    

      (3) We extended the controls to confirm FN delivery to CD63+ endosomes and showed that FN recycling is stopped upon reaching cell confluence (Figs S3F, S3G and Fig S3H).

      (4) We included more intensive characterisation of human atherosclerotic plaque morphology (H&E, Masson’s trichrome staining, Orcein, elastin fibers staining) to confirm predominant accumulation of sEV in the neointima (Figs S4A, S4B and S4C). We also excluded an endothelial origin for the  CD81+ sEVs (Fig 4G).

      (5) We included individual cellular tracks to the 2D migration analysis to confirm the statistical significance and concluded that ECM-associated sEVs regulate cell invasion directionality but not the cell speed (Figs 5A and 5B).

      (6) We showed surface localisation of collagen VI on sEVs confirming that it can activate signalling pathways leading to early FA formation on the FN matrix  (Figs 7D and 7E).

      (7) We included alternative explanations for some of our data in the discussion.      

      Reviewer #2 (Public Review):

      Extracellular vesicles have recently gained significant attention across a wide variety of fields, and they have therefore been implicated in numerous physiological and pathophysiological processes. When such a discovery and an explosion of interest occur in science, there is often much excitement and hope for answers to mechanisms that have remained elusive and poorly understood. Unfortunately, there is an equal amount of hype and overstatement that may also be put forth in the name of "impact", but this temptation must be avoided so that scientists and the broader public are not misled by overreaching interpretations and statements that lack rigorous and fully convincing evidence.

      Thank you for your comment and we agree that investigating sEVs is particularly challenging due to the their heterogeneity and nano-size, as well as complex biogenesis mechanisms. ECM-associated sEVs is a very new direction for the EV field but one that is particularly relevant to the vasculature where cells must invade through a thick ECM and where the accumulation of ECM-bound EVs is a unique and documented phenomenon.  To further strengthen out conclusions we have included new data to support our statements but also excluded statements re: filopodia as the origin of sEVs, that are out of scope of our study and need to be investigated further.

      The study presented by Kapustin et al. is certainly intriguing and timely, and it offers an interesting working hypothesis for the fields of extracellular vesicles and vascular biology to consider. The authors do a reasonable job at detecting these small extracellular vesicles, though some aspects of data presentation are missing such as full Western blots with accompanying size markers for the viewer to more fully appreciate that data and comparisons being made (see Figures 1 and 7).

      We agree with the reviewer and have now included molecular weight markers (Fig 1F, 7C, 7D, S3I, S4E) and provided all original western blot scans (uncropped and unedited) to the eLife editor. 

      Much of the imaging data from cell-based experiments is strong and conducted with many cutting-edge tools and approaches. That said, the static images and the dynamic imaging fall short of being fully convincing that the small extracellular vesicles found in the neighboring extracellular matrix are indeed being deposited there via the smooth muscle cell filopodia. Many of the lines of evidence presented suggest that this could occur, but alternative hypotheses also exist that were not fully ruled out, such as the ECM-deposited vesicles were secreted more from the soma and/or the lamellipodia that are also emitted and retracted from the cells. In particular, the authors show very nice dynamic imaging (Supplementary Figure S2A and Supplemental Video S1) that is interpreted as "extracellular vesicles being released from the cell" and these are seen as "bursts" of fluorescent signal; however, none of these appear to occur in filopodia as they appear within the cell proper (a "burst" of signal vs. a more intense "streak" of signal), which would be a stronger and more consistent observation predicted by the working model proposed by the authors.

      Our live and fixed cell microscope data as well as inhibitor analysis showed that sEV secretion can be associated with the filopodia. However we agree with the reviewer that the data generated using pHluoron GFP marker clearly indicate that the majority of sEVs are secreted from the cell soma toward the ECM:

      To reflect this, we have added further changes:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3)  Results, page 6 Line 27-36 “Formins and the Arp2/3 complex play a crucial role in the formation of filopodia, a cellular protrusion required for sensing the extracellular environment and cell-ECM interactions36. To test whether MVBs can be delivered to filopodia, we stained VSMCs for Myosin-10 (Myo10)37. We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows). Filopodia have been implicated in sEV capture and delivery to endocytosis “hot-spots”38, so next we examined the directionality of CD63+ MVB movement in filopodia by overexpressing Myo10-GFP and CD63-RFP in live VSMCs. Importantly, we observed anterograde MVB transport toward the filopodia tip (Fig 2F and Supplementary Video S2) indicative of MVB secretion”.

      (4) Results, page 6, Ln 37-44 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”.

      (5) Results, page 7 Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (6) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Imaging of related human samples is certainly a strength of the paper, and the authors are commended for attempting to connect the findings from their cell culture experiments to an important clinical scenario. However, the marker selected for marking extracellular vesicles is CD81, which has been described as present on the endothelium of atherosclerotic plaques with a proposed role in the recruitment of monocytes into diseased arteries (Rohlena et al. Cardiovasc Res 2009). More data should address this potentially confounding interpretation of the signals presented in images within Figure 4.

      We thank the reviewer for this insightful comment that the  sEV marker CD81 can originate from endothelial cells in agreement with Rohlena et al., 2009.   To address this we investigated the spatial overlap between CD81 and the endothelial marker, CD31. We observed very strong CD81 staining in the intact endothelial cell (intima) layer and occasional CD31 positive cells in the neointima. Importantly, quantification of colocalization confirmed that 80% of CD81 in the neointima does not overlap with CD31 excluding an endothelial origin of these sEVs. (Fig 4G).  Moreover, we included complete morphological characterisation of the atherosclerotic plaques confirming that CD81 sEVs were primarily observed in the neointima where VSMCs constitute the cellular majority (Fig S4A, S4B, S4C and S4D).

      On a conceptual level, the idea that the small extracellular vesicles contain Type VI Collagen, and this element of their cargo is modulating smooth muscle cell migration, is an intriguing aspect of the authors' working model. Nevertheless, the evidence supporting this potential mechanism does not quite fit together as presented. It is not entirely clear how the collagen VI within the vesicles is somehow accessed by the smooth muscle cell filopodia during migration. Are the vesicles lysed open once on the extracellular matrix? If so, what is the proposed mechanism for that to occur? If not, how are the adhesion molecules on the smooth muscle cell surface engaging the collagen VI fibers that are contained within the vesicles? This aspect of the model does not quite fit together with the proposed mechanism and may be an interesting speculative interpretation, warranting further investigation, but it should not be considered a strong conclusion with sufficient convincing data supporting this idea.

      We thank the reviewer for their insightful comments regarding the mechanism by which collagen VI associated with sEVs could modulate smooth muscle cell adhesion and migration. To clarify, our new data suggest that collagen VI is predominantly present on the surface of the sEVs, as evidenced by Fig 7E. This surface localization strongly implies that collagen VI can be directly accessed by cell surface adhesion receptors, without the need for vesicle lysis or opening. While we cannot entirely rule out all alternative mechanisms, we consider vesicle rupture or lysis within the extracellular matrix to be a highly unlikely route for collagen VI exposure, given the known stability of sEVs under physiological conditions. We have added these points to clarify:

      (1) Results, page 10, Ln 45 “To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E).”

      (2) Discussion, page 13, Ln 2 “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      (3) Discussion, page 14, Ln 30: In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis.    .   

      (4) Abstract Figure

      On a technical level, some of the statistical analysis is not readily understood from the data presented. It is very much appreciated that the authors show many of the graphs with technical and biological replicate values in addition to the means and standard deviations (though this is not clearly stated in all figure legends). However, in figures such as Figure 5, there are bars shown and indicated to be different by statistical comparison (see panel B in Figure 5). It is not clear how the values for Group 1 (no FN, no 3-OMS, no sEV) are statistically different (denoted by three asterisks but no p value provided in the legend) than Group 3 (no FN, 3-OMS added, no sEV), when their means and standard deviations appear almost identical. If this is an oversight, this needs to be corrected. If this is truly the outcome, further explanation is warranted. A higher level of transparency in such instances would certainly go a long way in helping address the current crisis of mistrust within the scientific community and at the interface with society at-large.

      We thank the reviewer for their careful reading and important comments on the statistical analysis. We acknowledge that the technical and biological replicate data were not clearly reported in all figure legends and that the statistical approach for Figures 5A and 5B required clarification. In response, we have made several changes for greater transparency and rigor:

      First, we have now explicitly included the numbers of biological replicates (N) and technical replicates (n) in all relevant figure legends for Figures 1–7. In addition, the number of individual cell tracks is now annotated for the migration/invasion analyses, along with the mean values for each dataset.

      Upon review, we found that the original statistical analyses for Figures 5A and 5B were conducted using pooled averaged data. To address this, we have repeated the statistical tests using pooled individual cell track data, applying the Kruskal–Wallis test with Dunn’s multiple comparison correction. This more stringent approach revealed revised p-values, which are now indicated in Figures 5A and 5B.

      With these corrections, we reconfirm our major findings: In the 2D model, fibronectin (FN) coating promotes VSMC velocity, while inhibition of sEV secretion with 3-OMS leads to reduced cell speed (Fig. 5A). Addition of sEVs to the ECM had no effect on VSMC speed at baseline but did rescue cell speed and distance in the presence of 3-OMS, consistent with EVs acting primarily on invasion directionality rather than speed in both 2D and 3D models (Fig. 5A, 5D). Furthermore, sEVs continue to significantly impact VSMC invasion directionality (Figs. 5G, 5H), in agreement with previous reports in tumor cells (Sung et al., 2015).

      In summary, we have implemented the following revisions:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      Results, page 9, line 14: The text has been updated to clarify the statistical approach and major findings as described above.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      Reviewer #1 (Recommendations For The Authors):

      We are very thankful for the comprehensive review and comments which helped to improve our data.

      Figure 1.<br /> The authors clearly show that FN stimulation (immobilized or cell-derived) promotes sEV secretion via canonical integrin pathways. FN is a promigratory substrate, hence its extensive use as a cell adhesion aid; thus one could assume that simply plating on FN induces a pro-migratory phenotype (later data supports this notion). Does the addition of growth factors also increase sEV release? An endogenous function of FN is siloing of various GFs during clot formation. Also, FAK and SRC networks intersect with canonical RTK signaling in terms of promoting Rac1, CDC42 and other migration mediators. The reason I believe this is important is because the data could be interpreted in two ways: 1) FN induces pro-migration signaling and then sEVs are released, or visa versa, FN induces sEV release and migration is initiated. GF supplementation in the absence of FN would clarify this relationship.

      We thank the reviewer for this insightful comment regarding the possible role of growth factors (GFs) and the mechanistic relationship between FN stimulation, sEV secretion, and cell migration. We agree that FN is a well-established promoter of cell migration, and it is important to distinguish whether FN directly induces a pro-migratory phenotype or does so via sEV-mediated signaling.

      Our data show that FN stimulation markedly increases VSMC motility, as reflected by enhanced cell speed (Fig. 5A), an increased number of focal adhesions (Fig. 6E), and facilitated centripetal movement of FAs (Fig. 6F). Interestingly, ECM-associated sEVs appear to play a complementary but distinct role: they do not significantly affect cell migration speed (Fig. 5A) but instead guide cell invasion directionality (Figs. 5G, 5H), reduce the number of FAs per cell (Fig. 6E), and promote early peripheral FA formation (Fig. 6F). In light of these findings, we have updated our graphical abstract to reflect the unique cross-talk mediated by sEVs between VSMCs and the ECM.

      Regarding the influence of growth factors, we acknowledge that FN can bind and present different GFs, which could also contribute to changes in sEV secretion. Although our inhibition studies and integrin-blocking antibody results support a primary role for β1 integrin activation and actin assembly in triggering sEV secretion, we cannot entirely exclude the possibility that FN-bound growth factors play a role in this process. We have now incorporated this point into the discussion to address the reviewer’s suggestion.

      Discussion, page 14 , Ln 7 “Although our small inhibitors and integrin modulating antibody data clearly indicate that β1 activation triggers sEV secretion via activation of actin assembly we cannot fully rule out that FN may also be modulating growth factor activity which in turn contributes to sEV secretion by VSMCs<sup>23</sup>.  Excessive collagen and elastin matrix breakdown in atheroma has been tightly linked to acute coronary events hence it will be interesting to study the possible link between sEV secretion and plaque stability as sEV-dependent invasion is also likely to influence the necessary ECM degradation induced by invading cells<sup>96</sup>

      Figure 2.<br /> • The authors provide no evidence (or references) that SMIFH2 or CK666 halts filopodia extensions.

      Thank you for this important note. We have included the corresponding references:

      Results, page 5: “So next we tested the contribution of Arp2/3 and formins by using the small molecule inhibitors, CK666 and SMIFH2, respectively31, 32”.  

      • Is there an increase in filopodia density when plated on FN vs plastic? Similarly, if there are more filopodia present is that associated with more sEV? Please provide evidence in this regard.

      We agree that connecting the number of filopodia with the secretion level of sEVs may be an important clue if sEV secretion can be driven by FN-induced filopodia formation. However, Myosin10 staining to quantify filopodia (Fig 2E) showed no difference between VSMCs plated on plastic versus FN matrix. Therefore, we agree with the reviewer that the filopodia contribution to sEV secretion needs to be investigated further.  This idea is reflected in the following comments:

      (1) Results, page 6, Ln 29 “We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows).

      (2) Results, page 6, Ln 37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (3) Discussion, page 12, Ln 15 : “Focal complexes either disassemble or mature into the elongated centripetally located FAs48. In turn, these mature FAs anchor the ECM to actin stress fibres and the traction force generated by actomyosin-mediated contractility pulls the FAs rearward and the cell body forward12, 13. Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells..”

      As hinted above, this data could be interpreted in the light of generally inhibiting cell migration to blunt sEV shedding. Does cell confluence affect sEV release? If cells are cultured to 100% confluency this would limit filopodia formation regardless of ECM type. If sEV secretion remains elevated on FN in this culture condition it would suggest a lack of dependency on filopodia.

      We thank the reviewer for this thoughtful suggestion regarding the influence of cell confluence on sEV release and filopodia formation. To directly address this hypothesis, we performed additional experiments comparing VSMCs cultured at low and high confluency. As described in the revised Results (page 7, line 39), we found that high cellular confluency reduced FN recycling, as indicated by the marked decrease in intracellular FN-positive spots and loss of colocalization with CD63 (Figs S3F, S3G). Importantly, this was accompanied by a significant reduction in CD63+/CD81+ sEV secretion by confluent cells (Fig S3H). These results suggest that VSMC confluence, which suppresses filopodia formation, nearly abolishes both intracellular FN trafficking and sEV secretion, even in the presence of FN. Thus, under our experimental conditions, sEV secretion by VSMCs appears to be closely linked to dynamic cell–matrix interactions and is dramatically reduced when these processes are limited by confluence:

      (1) Results, page 7, Ln 39 : “Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H)..  

      • Inhibition of branched actin polymerization has been shown to reduce both exocytic and endocytic activity. Thus, it is hard to interpret the results of Fig. 2B than anything more than a generalized effect of losing actin.

      We thank the reviewer for this important point regarding the broad cellular functions of branched actin polymerization, and agree that generalized actin loss can influence both exocytic and endocytic pathways. To address this, we performed additional experiments and analyses to better define the relationship between branched actin structures and sEV-related processes in VSMCs.

      As described in the revised Results (page 6), we overexpressed ARPC2-GFP (an Arp2/3 subunit) together with F-tractin-RFP in VSMCs and carried out live-cell imaging. This approach revealed that Arp2/3 and F-actin organize into lamellipodial scaffolds at the cell cortex, as expected (Fig. S2A; Supplementary Video S2). Additionally, and more unexpectedly, we observed numerous Arp2/3– and F-actin–positive dynamic spots within the VSMC cytoplasm. These structures resemble actin comet tails seen in other systems, previously implicated in endosomal propulsion (Fig. S2A, arrow; Supplementary Video S2).

      Quantitative analysis confirmed that a substantial fraction of these dynamic F-actin/cortactin spots colocalized with CD63+ endosomes (Fig. 2D), and that these structures are indeed branched actin tails based on cortactin immunostaining. Furthermore, inhibition of SMPD3 (with 3-OMS) induced enlarged cortactin/F-actin/CD63+ complexes, morphologically similar to invadopodia (Fig. 2D, arrowheads), supporting a functional link between actin branching and MVB dynamics.

      To quantify the association, we calculated Manders’ colocalization coefficients for F-actin tails and CD63+ endosomal structures in fixed VSMCs, observing that ~50% of F-actin tails were associated with ~13% of endosomes. Upon 3-OMS treatment, this overlap increased further (Fig. S2B).

      Finally, using live-cell imaging (Fig 2C; Supplementary Video S4), we directly observed CD63+ MVBs being propelled through the cytoplasm by Arp2/3-driven actin tails, suggesting a mechanistic role for branched actin assembly in MVB intracellular transport, rather than a generalized effect of actin disruption alone.

      We believe these combined data reinforce a more specific mechanistic role for Arp2/3-mediated branched actin in MVB/endosome transport and, consequently, in sEV secretion in VSMCs—over and above an indirect effect of global actin loss. We hope these additional experiments and quantitative analyses address the reviewer’s concern and clarify the functional relevance of branched actin structures to sEV trafficking:

      (1) Results, page 6, Ln 3 “As regulators of branched actin assembly, the Arp2/3 complex and cortactin are thought to contribute to sEV secretion in tumour cells by mediating MVB intracellular transport and plasma membrane docking[28, 33]. Therefore, we overexpressed the Arp2/3 subunit, ARPC2-GFP and the F-actin marker, F-tractin-RFP in VSMCs and performed live-cell imaging. As expected, Arp2/3 and F-actin bundles formed a distinct lamellipodia scaffold in the cellular cortex (Fig S2A and Supplementary Video S2). Unexpectedly, we also observed numerous  Arp2/3/F-actin positive spots moving  through the VSMC cytoplasm that resembled previously described endosome actin tails observed in Xenopus eggs[33] and parasite infected cells where actin comet tails propel parasites via filopodia to neighbouring cells[34, 35] (Fig S2A, arrow, and Supplementary Video S2). Analysis of the intracellular distribution of Arp2/3 and CD63-positive endosomes in VSMCs showed CD63-MVB propulsion by the F-actin tail in live cells (Fig 2C and Supplementary Video S4).”

      (2) Results, New data Fig 2D, page 6, Ln 14. “we observed numerous F-actin spots in fixed VSMCs that were positive both for F-actin and cortactin indicating that these are branched-actin tails (Fig 2D). Moreover, cortactin/F-actin spots colocalised with CD63+ endosomes and addition of the SMPD3 inhibitor, 3-OMS, induced the appearance of enlarged doughnut-like cortactin/F-actin/CD63 complexes resembling invadopodia-like structures similar to those observed in tumour cells (Fig 2D, arrowheads)[18].”

      (3) Results, New data Fig S2B, page 6, Ln 19 “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75±0.18 and tM2=0.25±0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs”

      • In video 1 the author states (lines 8-9; pg6) "intense CD63 staining along filopodia" Although, there is some fluorescence (not strong) in these structures, there was no visible exocytic activity. This data is more suggestive that sEVs (marked by CD63) are not associated with filopodia. The following conclusion statement the authors make is overreaching given this result.

      We thank the reviewer for this careful observation and agree that the previous conclusion regarding sEV release from filopodia was overstated. In response, we have revised both the Results and Discussion sections to more accurately reflect the data..

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig 2D and video 2 are wholly unconvincing with regard to sEV secretion sites. The authors could use their CD63-pHluroin construct to count exocytic events in the filopodia vs the whole cell. Given the movie, I have a suspicion this would not be significant. The authors could also perform staining CD63 in non-permeabilized cells to capture and count exocytic events at the plasma membrane as well as their location between groups.

      We thank the reviewer for these constructive suggestions and their critical assessment of our current data regarding the sites of sEV secretion. We agree that our CD63-pHluorin approach clearly indicates sEV secretion events in the soma at the cell–ECM interface, while we did not observe comparable events in filopodia. Accordingly, we have clarified these points in the revised manuscript.

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig. 2E and video 4. Again, the conclusions drawn from this data are very strained. First, no co-localization quantification is presented on the proportion of CD63 vesicles with actin. Once again, the movie, if anything convinces the reader that 95-99% of all CD63 vesicles are not associated with actin; therefore, this is an unlikely mechanism of transport.

      We thank the reviewer for this valuable comment and for highlighting the need for quantitative co-localization analysis. In response, we developed a method to systematically quantify F-actin and CD63 co-localization in fixed VSMCs, as now presented in new Figures 2D and S2B. We acknowledge that the majority of CD63+ endosomes are not associated with F-actin, consistent with the reviewer’s interpretation. However, our quantitative data now show that a specific subpopulation of MVBs appears to utilize this actin-based mechanism for transport. We believe this addresses the concern and more accurately reflects the prevalence and significance of the mechanism described.

      (1) Results, page 6 , Ln 19. “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75+/-0.18 and tM2=0.25+/-0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs.”

      • Are there perturbations that increase filopodia numbers? A gain of function experiment would be valuable here.

      We thank the reviewer for this important suggestion regarding the potential value of gain-of-function experiments to clarify filopodia’s contribution to sEV release. In agreement with the reviewer’s scepticism, we have removed statements linking filopodia to sEV release from both the title and abstract to avoid overinterpretation. At present, our understanding of filopodia biology and the lack of robust tools to selectively and substantially increase filopodia numbers in VSMCs prevent us from directly addressing this question through gain-of-function assays. We acknowledge that future studies using established methods—such as overexpression of filopodia-inducing proteins (e.g., mDia2 or fascin)—could provide insight into whether an increased number of filopodia affects sEV release. However, such experiments are beyond the scope of the current manuscript. We have made the following changes to clarify these points:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 3<br /> • Fig 3A. The CD63 staining is strongly associated with the entire plasma membrane. How are the authors distinguishing between normal membrane shedding and bona fida sEVs based on this staining alone (?)- this is insufficient as all membrane structures are seemingly positive. Additionally, there are very few sEVs in scrutinizing the provided images. For the "sEV secretion, fold change" graphs in previous figures, could the authors provide absolute values, or an indication of what these values are in absolute terms?

      We thank the reviewer for raising this important point regarding the specificity of CD63 staining and the need to distinguish bona fide sEVs from membrane fragments or general membrane shedding. We agree that CD63 staining alone at the plasma membrane or in the extracellular matrix is not sufficient to unequivocally identify sEVs. To address this, we employed several complementary approaches to rigorously characterize ECM-associated sEVs:

      First, using high-resolution iSIM imaging, we confirmed the association of CD63-positive particles specifically with the FN-rich matrix, and demonstrated that SMPD3 knockdown significantly reduced the number of CD63+ particles in the matrix (Fig. 3B; revised from Fig. S3A).

      Second, by incubating FN matrices with purified and fluorescently labeled sEVs, we directly observed efficient entrapment of these labeled sEVs within the matrices (Fig. 3E), confirming that sEVs can interact with and be retained by the ECM.

      Third, we developed and applied a sequential extraction protocol using mild salt buffer (0.5M NaCl) and strong denaturant (4M guanidine HCl) to selectively extract ECM-associated sEVs based on the strength of their association (see new Figs. S3A and S3B). Extracted vesicles were then characterized by ExoView analysis, which demonstrated a tetraspanin profile (CD63+/CD81+/CD9+) closely matching that of sEVs from conditioned media, providing evidence that these particles are true sEVs and not merely membrane debris. We also found that the more weakly bound (NaCl-extracted) fraction closely resembles media-derived sEVs, while the strongly bound (GuHCl-extracted) fraction is more enriched in CD63+ and CD63+/CD81+ sEVs but contains very few CD9+ vesicles, further supporting distinct extracellular vesicle subpopulations within the ECM.

      In addition, the abundance of CD63+/CD81+ sEVs in both media and ECM-derived fractions was independently validated by CD63 bead-capture assay (Fig. S3B).

      We hope these clarifications and the expanded data set address the reviewer’s concerns about sEV identification and quantification in the extracellular matrix:

      (1) Results, page 7, Ln 16. To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) 42. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      • A control of fig 3b would be helpful to parse out random uptake of extracellular debris verses targeted sEV internalization. It would be helpful if the authors added particles of similar size to that of the sEVs to test whether these structures are endocytosed/micropinocytosed at similar levels.

      We thank the reviewer for this useful suggestion regarding the need for better controls to distinguish specific sEV uptake from nonspecific internalization of extracellular debris or similarly sized particles. As a comparison, in our study we analyzed the uptake of both sEVs and serum proteins such as fibronectin and fetuin-A (Figs S3C and S3D), and observed similar patterns of intracellular trafficking. However, we acknowledge that inert nanoparticles or beads of a similar size to sEVs could serve as potential controls to assess nonspecific micropinocytosis or endocytosis.

      It is important to note, however, that the uptake of sEVs is strongly influenced by their surface protein composition and the so-called “protein corona.” Recent work from Prof. Khuloud T. Al-Jamal’s group underscores that exosome uptake mechanisms may be highly specific (Liam-Or et al., 2024), and studies from Mattias Belting’s lab have also shown the importance of heparan sulfate proteoglycans in exosome endocytosis (Cerezo-Magana et al., 2021). As a result, uptake comparisons with inert particles or beads may not fully recapitulate the specificity of sEV internalization, and distinct nanoparticle classes may rely on different uptake pathways.

      Figure 4<br /> • Fig. 4E,F,G. How are the authors determining the neointima and media compartments without ancillary staining for basement membrane or endothelial markers? Anatomic specific markers need to be incorporated here for the reader to evaluate the specificity of the FN and CD81 staining. It is also hard to understand the severity of the atherosclerotic lesion without a companion H&E cross section.

      We thank the reviewer for highlighting the need for more rigorous characterization of atherosclerotic lesion architecture and anatomical compartments in our study. In response, we have incorporated additional histological analyses and now provide ancillary staining and companion images to enable clear identification of the neointima and medial compartments, as well as to assess lesion severity (see new Figs S4A–S4D):

      (1)Results, page  8, Ln 28. . “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F).”

      • Figs s4c, S4d- proper controls are not provided. Again, a non-FN internalization control as well as a 4oC cold block negative control is required to interpret this data.

      We thank the reviewer for this valuable suggestion. To enhance the rigor of our internalization assays, we have now included several additional controls using alternative treatments, fluorophore combinations, and internalization conditions:

      a) We performed FN-Alexa568 uptake assays, followed by immunostaining for CD63 with a distinct fluorophore (Alexa488), to confirm the colocalization of internalized FN with CD63+ endosomal compartments in VSMCs (new Fig. S3E).

      b) We also stained VSMCs, cultured under normal growth conditions, with an anti-FN antibody to visualize intracellular serum-derived FN and again observed colocalization with CD63 (new Figs. S3F and S3G). Notably, in cells grown to confluence, we observed a complete loss of intracellular FN staining and FN/CD63 colocalization, suggesting that FN recycling is prominent in sparse, motile cells, but not in confluent populations.

      These additional controls strengthen our conclusions regarding FN internalization pathways and the conditions under which FN trafficking to the endosomal system occurs:

      (1) Results, page 7, Ln 31  We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reduced likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G)..

      • Can the authors please provide live and fixed imaging of FN and CD63-mediate filopodial secretion to amply support their conclusions.

      We have observed CD63 MVBs in both fixed (Fig 2E) and live VSMCs (Fig 2F) yet we agree that further studies are required to establish the contribution of filopodia to sEV secretion. Therefore, we have added the following changes:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 5

      • Fig. 5A,B. The authors claim that sEV supplementation enhances VSMC migration speed and distance. The provided graphs show only a marginal increase in speed with sEV addition (A) but, concerningly, there is a four-star significant difference between the FN condition compared with FN+sEV (B) while the means appear the same. How are these conditions statistically different? The statistics seem off for these comparisons.

      We thank the reviewer for highlighting concerns regarding the statistical analysis in Figures 5A and 5B. In response, we have carefully re-examined our data and statistical approach to ensure accuracy and transparency.

      First, we have now included all individual cell migration tracks in the data representation for these figures. The statistical tests were repeated using the Kruskal–Wallis test with Dunn’s multiple comparison correction across all groups. This more stringent analysis confirmed our key findings: fibronectin (FN) stimulates VSMC migration speed, while inhibition of sEV secretion (with 3-OMS) reduces cellular speed (Fig. 5A). Addition of exogenous ECM-associated sEVs modestly restored cell speed in the presence of 3-OMS, but had no effect on baseline migration speed in 2D or 3D models (Figs. 5A, 5D).

      Regarding the four-star significance observed in the original Fig. 5B, the previous result reflected an analysis based on pooled group averages, which may have overstated marginal differences. The revised analysis, based on individual cell tracks, does not support a substantial difference between FN and FN+sEV groups. The revised p-values and comparisons are now provided directly on the figures and described in the figure legends. We also clearly report the numbers of biological replicates, technical replicates, and individual data points for every condition.

      Further, the modest effect of ECM-associated sEVs on speed is consistent with our observation that sEVs influence invasion directionality rather than baseline migration velocity, in agreement with previous findings in tumor models (Sung et al., 2015).

      The manuscript has been revised accordingly, with updates in:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      (3) Results, page 9, line 14:  “FN as a cargo in sEVs promotes FA formation in tumour cells and increases cell speed14, 15. As we found that FN is loaded into VSMC-derived sEVs we hypothesized that ECM-entrapped sEVs can enhance cell migration by increasing cell adhesion and FA formation in the context of a FN-rich ECM. Therefore, we tested the effect of sEV deposition onto the FN matrix on VSMC migration in 2D and 3D models. We found that FN coating promoted VSMC velocity and inhibition of bulk sEV secretion with 3-OMS reduced VSMC speed in a 2D single-cell migration model (Figs. 5A, 5B) in agreement with previous studies using tumour cells14, 15. However, addition of sEVs to the ECM had no effect on VSMC speed at baseline but rescued cell speed and distance in the presence of the sEV secretion inhibitor, 3-OMS suggesting the EVs are not primarily regulating cell speed (Figs 5A and 5B).”

      (4) Results, page 9, Ln 29 “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      • Fig d-h. Generally, the magnitude of the difference between the presented conditions are biologically insignificant. Several of the graphs show a four-star difference with means that appear equivalent with overlapping error bars. Do the authors conclude that a 0.1%, or less, effect between groups is biologically meaningful?

      We thank the reviewer for drawing attention to the apparent mismatch between statistical significance and biological relevance in Figures 5d–h. In response, we have reanalyzed the data using individual cell tracks and more stringent non-parametric statistical tests, as described above. This reanalysis confirmed that the magnitude of differences in migration speed and related parameters between the groups is minimal and not biologically meaningful. Thus, we no longer claim that sEVs significantly affect VSMC migration speed under these conditions in either 2D or 3D assays. Our revised manuscript now accurately reflects this finding in both the Results and Discussion sections, and the updated figures and legends clarify the true extent of any differences observed.

      Figure 6

      • Generally, the author's logic for looking into adhesion, focal adhesion and traction forces is hard to follow. If there are sEV-mediated migration differences, then there would inexorably be focal adhesion alterations. However, the data indicates few differences brought on by sEVs, which speaks to the lack of migration differences presented in Fig. 5. Overall, the sEV migration phenotype has so little of an effect, to then search for a mechanism seems destine to not turn up anything significant.

      We thank the reviewer for highlighting the importance of connecting the observed phenotypic effects of sEVs to the investigation of adhesion and focal adhesion mechanisms. While our revised analysis confirms that sEVs have little to no effect on VSMC migration speed or distance in 2D and 3D models, we did observe a robust effect of sEVs on the directionality of cell invasion (Figs. 5G and 5H). This prompted us to look more closely at pathways involved in cell guidance rather than bulk cell motility.

      Our proteomic comparison between larger EVs (10K fraction) and sEVs (100K fraction) revealed a unique adhesion complex present specifically on the sEVs—comprising collagen VI, TGFBI, LGALS3BP, and EDIL3 (Figs. 7A–C)—each of which has previously been implicated in integrin signaling, cell adhesion, or invasion. Functional blocking and knockdown studies further identified collagen VI as a key mediator in the regulation of cell adhesion and invasion directionality influenced by sEVs (Figs. 7F and 7I).

      In response to this mechanistic insight, we have modified the graphical abstract and discussion to clarify our approach:

      We now explicitly state that our focus has shifted from analyzing baseline migration speed to mechanisms guiding invasion directionality, in line with our key phenotypic findings.We highlight that the unique adhesion cluster identified on sEVs—including collagen VI and its cooperative partners—provides a strong mechanistic rationale for examining focal adhesion dynamics and ECM interactions, even in the absence of changes in migration velocity.Discussion excerpts (pages 13–14) have been updated to reflect this rationale and to summarize the potential significance of these findings for vascular biology and disease.

      We hope this clarifies the logic underlying our approach and justifies the mechanistic studies performed in this context:

      (1) Discussion, page 13, Ln 2  “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion.”

      (2) Discussion, page 13, Ln 30  “In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis”.    . 

      (3) Discussion, page 14, Ln 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controlling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     ”.    

      Figure 7<br /> • The authors need to provide additional evidence Col IV is harbored in sEVs and not a contaminant of sEV isolation as VSMCs secrete a copious amount of this in culture. For instance, IHC of isolated sEVs stained for CD63 and Col IV as well as single cell staining of the same sort.

      We thank the reviewer for this important comment regarding the specificity of collagen VI detection in sEVs. To ensure that collagen VI is associated with bona fide sEVs—rather than being a contaminant resulting from high extracellular abundance—we performed a comparative analysis of vesicles isolated from the same conditioned media. Both proteomic mass spectrometry and western blotting revealed that collagen VI was exclusively present in the small EV (100K pellet) fraction and not in the larger EVs (10K pellet), as shown in Figs. 7B and 7C. Collagen VI was further identified in sEVs extracted from the ECM using our salt/guanidine protocol (new Fig. 7D).

      Reviewer #2 (Recommendations For The Authors):

      The authors have presented a nice collection of data with strong approaches to address their hypotheses. Nevertheless, an additional section within the Discussion would be welcome in addressing the potential limitations and important caveats to be considered alongside their study. These caveats and limitations could be reshaped by additional data supporting the ideas that: (1) small extracellular vesicles can be directly observed during their secretion from filopodia, (2) CD81 labeling in tissue can be interpreted clearly as extracellular vesicles and not the cell surface of other cell types (co-staining with an endothelial cell marker such as PECAM-1 perhaps), and (3) collagen VI within the vesicles is somehow accessed by adhesion molecules on the cell surface of migrating cells.

      We thank the reviewer for these important suggestions and we have now added further studies and modified our conclusions to reflect the data more accurately:

      (1) Results. Page 6, Ln37  “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”..  

      (2) Discussion, page 12, Ln18: “Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells”..

      We quantified the colocalization of CD81 and CD31 to exclude the endothelial cell origin of sEVs and extended the characterisation of the atherosclerotic matrix as well as highlighting any limitations to interpretation ie re  CD81 ECM localisation: 

      (1) Results, page 8, Ln 43 “An enhanced expression of CD81 by endothelial cells in early atheroma has been previously reported so to study the contribution of CD81+ sEVs derived from endothelial cells  we investigated the localisation of CD31 and CD8145. In agreement with a previous study, we found that the majority of CD31 colocalises with CD81 (Thresholded Mander's split colocalization coefficient 0.54±0.11, N=6) indicating that endothelial cells express CD81 (Fig 4G)45. However, only a minor fraction of total CD81 colocalised with CD31 (Thresholded Mander's split colocalization coefficient 0.24±0.06, N=6) confirming that the majority of CD81 in the neointima is originating from the most abundant VSMCs.. 

      (2) Results, page 8, Ln 28: “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F). .”

      We showed that collagen VI is presented on the surface of sEVs:

      (1) Results, page 10, Ln43: “Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG253 and suppression of cell spreading on FN54. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Given the mechanical nature of the device and the propensity for mice to urinate on things, I also wonder how frequently the device breaks/needs to be repaired. Perhaps some details regarding the cost and reliability of the device would be helpful to include, as these are the two things that could make researchers hesitant to adopt immediately.

      We thank the reviewer for their astute observations. We also noted the problem of mouse waste and incorporated this concern into the redesign we mention in the text.

      “Mouse waste getting on mechanical parts was found to be a major concern for the initial version of the device. As part of the redesign, the linear stages were moved out from under the mice to avoid this problem. Despite this problem, the original version of the device has not had any of its stages break down yet. A common problem though was that stimulus tips would blunt or break if they hit the mesh of the mesh table, requiring replacement. This has been solved in the latest version through a new feature where the mesh is detected via the force sensor, prompting immediate stimulus withdrawal, avoiding damage.”

      In regards to cost and adoption, we have added this sentence to the final line of the discussion:

      “To promote wide adaptation of this device across as many labs as possible, a company, Tactorum Inc., has been formed.”

      (2) The only major technical concern, which is easy to address, is whether the device generates ultrasonic sounds that rodents can hear when idle or operational, across the ultrasonic frequencies that are of biological relevance (20-110 kHz). These sounds are generally alarm vocalizations and can create stress in animals, and/or serve as cues of an impending stimulus (if indeed they are produced by the device).

      The reviewer brings up an interesting question. The ARM does not make a lot of noise, but some of the noise it emits does range into the 20-110 kHz range, though besides this does not qualitatively have other similarities to a mouse vocalization. Based on this we tested whether the noise produced by the ARM causes stress in naïve mice.

      “A concern was raised that the noise of the ARM may cause stress in the mice tested. To test this, the open field test was performed with naïve mice (n=10) 2 feet from the ARM while the ARM either sat silent or ran through its habituation program, producing noise. The mouse's center point movement was then tracked in relation to the chamber, its edges, and center. No significant differences were found in distance traveled, center entrances, center, time in center, and latency to center entrance based on a student’s two-tailed t-test (Figure S1D-G). Based on this, neither stress nor locomotion differences were detected by this test, indicating the ARM does not induce an increased stress state due to its noise, even in non-habituated mice.”

      (3) This sentence in the intro may be inaccurate: "or the recent emergence of a therapeutic targeting voltage-gated sodium channels, that block pain in both rodents and humans such as VX-548 for NaV1.8 (Jones 2023)" Despite extensive searching, I have been unable to find a reference showing that VX-548 is antinociceptive in rodents (rats or mice). As for why this is the case, I do not know. One speculation: this drug may be selective for the human Nav1.8 channel (but again, I have found no references comparing specificity on human vs rodent Nav1.8 channels). To not mislead the field into thinking VX-548 works for rodents and humans, please remove "both rodents and" from the sentence above (unless you find a reference supporting VX-548 as being effective in pain assays with rodents. There is a PK/PD paper with rodents, but that only looks at drug metabolism, not efficacy with pain assays).

      We agree with the reviewer and have removed mention of the new Nav1.8 therapeutic also working in rodents.

      (4) In the intro paragraph where variability in measuring mechanical stimuli is described, there is a new reference from the Stucky lab that further supports the need for an automated way to measure allodynia, as they also found variability between experimenters. This would be a relevant reference to include: Rodriguez Garcia (2024) PMID: 38314814.

      Thanks to the reviewer for this relevant citation and we have updated the text to incorporate this:

      “Recent studies utilizing the manual highspeed analysis of withdrawal behavior analysis developed by Abdus-Saboor et al. 2019 has reproduced this sizable experimenter effect using the new technique. (Rodríguez García 2024)”

      (5) "a simple sin wave motion": should be "sine", correct throughout (multiple instances of "sin")

      Corrections made where relevant.

      Reviewer #2 (Public review):

      (1) ARM seems like a fantastic system that could be widely adopted, but no details are given on how a lab could build ARM, thus its usefulness is limited.

      The reviewer raises a good point, unfortunately the authors are constrained by university policies around patent law. That said efforts are being made to make the ARM widely available to interested researchers. As mentioned above to Reviewer 1’s comments, we end the discussion section with this sentence:

      “To promote wide adaptation of this device across as many labs as possible, a company, Tactorum Inc., has been formed.”

      (2) The ARM system appears to stop short of hitting the desired forces that von Frey filaments are calibrated toward (Figure 2). This may affect the interpretation of results.

      The reviewer gives an important observation. We amended the text to include more clarity on the max forces induced, and comments on causes beyond the delivery mechanism. It should be noted that a newly bought fresh set of von Frey’s was used.

      “With the same 1.4 and 2 g von Frey filaments Researcher 1 delivered max average forces of 1.5 g and 2.7 g, and Researcher 2 1.35 g and 2.4 g. The ARM delivered average max forces closest to the targeted forces, with 1.36 g and 1.9 g. (Figure 2C) Some of the error observed could be due to the error rate (+/- 0.05 g) in the force gauge and the von Frey set used.”

      (3) The authors mention that ARM generates minimal noise; however, if those sounds are paired with stimulus presentation, they could still prompt a withdrawal response. Including some 'catch' trials in an experiment could test for this.

      The reviewer makes a very useful suggestion that we incorporated into our carrageenan experiments. This new data can be found in Supplemental Figure 3F.

      “For the carrageenan model, three replicates of the force ramp stimulus were delivered to each paw, and catch trials were performed every 3<sup>rd</sup> trial to test whether the mice would respond to the noise of the ARM alone. During catch trials, the stimulus was delivered to the open air behind the mouse, and any movement within 5 seconds of stimulus delivery was counted as a response. These trials found a 96% response rate in true trials, with only a 7% rate in catch trials, indicating responses were not being driven by device noise.”

      (4) The experimental design in Figure 2 is unclear- did each experimenter have their own cohort of 10 mice, or was a single cohort of mice shared? If shared, there's some concern about repeat testing.

      Further clarification was added to avoid confusion on the methods used here.

      “Separate cohorts of 10 mice were used for ARM and manual delivery, with a week given between each researcher to avoid sensitization.”

      (5) In Figure 5 and S4, the order of the legends does not match the order of the graphs. This can be particularly confusing as the color scheme is not colorblind-friendly. Please consider revising the presentation of these figures.

      Corrections made where relevant.

      Reviewer #3 (Public review):

      (1) Limited details are provided for statistical tests and inappropriate claims are cited for individual tests. For example, in Figure 2, differences between researchers at specific forces are reported to be supported by a 2-way ANOVA; these differences should be derived from a post-hoc test that was completed only if the independent variable effects (or interaction effect) were found to be significant in the 2-way ANOVA. In other instances, statistical test details are not provided at all (e.g., Figures 3B, 3C, Figure 4, Figure 6G).

      We would like to thank the reviewer for pointing out the lack of clarity in the text on these statistical methods. We have added further details across the manuscript and shown below here in order to address this concern.

      “Both manual delivery and the ARM produced significant paw withdrawal percentage curves, a standard traditional measurement of mechanical sensitivity in the field (von Frey 1896, Dixon 1980, Chaplan 1994)(Figure 2E), with a 2-way ANOVA and a posthoc Tukey test detecting significant increases in comparing the 3 lower force VFH’s (0.02g, 0.07g, 0.16g) to the 2 highest force VFH’s (1g, 1.4g). This demonstrates that the ARM delivers results comparable to highly experienced researchers. However, a 2-way ANOVA and a posthoc Tukey test found that Researcher 2 elicited a significantly higher (p=0.0008) paw withdrawal frequency than Researcher 1 (Figure S2A) which corresponded with Researcher 2’s higher VFH application time as measured by the force sensor (Figure 2B).”

      “Adjustments were then made to the PAWS software to automate the measurement of withdrawal latency based on pose tracking data of the withdrawal response and the trajectory of the stimulus delivery encoded into the ARM. Testing of C57/BL6J (n=15) at baseline found significant decreases in withdrawal latency for pinprick compared to cotton swab stimuli delivered in identical ways by the ARM (Figure 3B) based on a 2-tailed student t-test.”

      “Mice injected with carrageenan (n=15) showed elevated shaking behavior (p=0.0385, 2-way ANOVA and a posthoc Tukey test) in response to pinprick stimuli in comparison to measurements at baseline (Figure 3C).”

      “Remote habituated mice showed a significant decrease (p=0.0217, 2-way ANOVA) in time to rest over the 3 days (Figure 4B), but no significant differences for any single day. The number of turns was measured for each group during the first 10 minutes of day 1 to act as a baseline, and then from 20 to 30 minutes for each day. Turn counts were then compared as a percentage of the baseline count for each group. This period was chosen as it the period when experiments start after the day of habituation on experimental days. It was found that remote-habituated mice showed significantly less turning on day 2 compared to mice habituated with a researcher present (p=0.024, 2-way ANOVA posthoc Tukey test), and that only the remote-habituated mice showed significantly decreased turning behavior on day 3 compared to day 1 (p=0.0234, 2-way ANOVA posthoc Tukey test) (Figure 4C).”

      “Sex-dependent differences were found in reflexive and affective behavioral components of the mouse withdrawal response when a researcher was present versus not for both reactions to innocuous and noxious stimuli. A 2-way ANOVA and a posthoc Tukey test found that cotton swab stimuli elicited increased male mouse reflexive paw withdrawal features, including max paw height (p=0.0413) and max paw velocity (Y-axis) (p=0.0424) when Researcher 1 was present compared to when no researcher was present (Figure 4E-F). Pinprick stimuli (Figure 4H-I) on the other hand led to increased max paw height (p=0.0436) and max paw velocity (Y-axis) (p=0.0406) in male mice compared to female mice when Researcher 1 was present.

      Analysis of the shaking behavior elicited by cotton swab and pinprick stimuli found no significant differences in shaking behavior duration (Figure 4SA-B) but found sex-dependent differences in paw distance traveled after the initial withdrawal, including during shaking and guarding behaviors. For cotton swab (Figure 4G) male mice showed significantly increased paw distance traveled compared to female mice when Researcher 2 was present (p=0.0468, 2-way ANOVA posthoc Tukey test) but not when Researcher 2 was present or no researcher was present. Pinprick stimuli also elicited sex-based increases in paw distance traveled (Figure 4J) in male mice when Researcher 2 was present compared to both male mice when no researcher was present (p=0.0149, 2-way ANOVA posthoc Tukey test) and female mice when Researcher 1 was present (p=0.0038, 2-way ANOVA posthoc Tukey test).”

      (2) In the current manuscript, the effects of the experimenter's presence on both habituation time and aspects of the withdrawal reflex are minimal for Researcher 2 and non-existent for Research 1. This is surprising given that Researcher 2 is female; the effect of experimenter presence was previously documented for male experiments as the authors appropriately point out (Sorge et al. PMID: 24776635). In general, this argument could be strengthened (or perhaps negated) if more than N=2 experiments were included in this assessment.

      The reviewer makes an important point regarding this data and the need for further experiments. We designed a new set of experiments to examine the effect of male and female researchers overall. It should be noted that this is rather noisy data given it was collected by three sets of male and female researchers over 3 weeks. That said a significant difference was found between mouse sexes when a male researcher was present. This is consistent with previous data, but as we discuss this does not invalidate previous data as researcher gender appears to be only one of the factors at work in researcher presence effects on mouse behavior, leading to individuals having the potential for greater or lesser effects than their overall gender. Our new results can be found in Figure 4K.

      “These results indicate that researcher presence at baseline can lead to significant differences in reflexive and affective pain behavior. In this case, male mice showed increased behavioral responses to both touch and pain behavior depending on whether the researcher was present. This led to sex differences in the affective and reflexive component of the withdrawal response when a researcher is present, which disappears when no researcher is present, or a different researcher is present. For this set of researchers, the female researcher elicited the greater behavioral effect. This appeared at first to contradict previous findings (Sorge 2024, Sorge 2014), but it was hypothesized that the effect of an individual researcher could easily vary compared to their larger gender group. To test this, 6 new researchers, half male and half female, were recruited and a new cohort of mice (n=15 male, n=15 female) was tested in each of their presence over the course of 3 weeks, controlling for circadian rhythms (Figure 4K). The newly added force ramp stimulus type was used for these experiments, with three replicates per trial, to efficiently measure mechanical threshold in a manner comparable to previous work. It was found that female mice showed significantly decreased mechanical threshold compared to male mice (p=0.034, Šídák's multiple comparisons test and student’s t-test) when a male researcher was present. This did not occur when a female researcher or no researcher was present. In the latter case of slight trend towards this effect was observed, but it was not significant (p=0.21), and may be the result of a single male researcher being responsible for handling and setting up the mice for all experiments.”

      “These findings indicate that sex-dependent differences in evoked pain behavior can appear and disappear based on which researcher/s are in the room. There is a trend towards male researchers overall having a greater effect, but individuals may have a greater or lesser effect on mouse behavior, independent of the gender or sex. This presents a confound that must be considered in the analysis of sex differences in pain and touch behavior which may explain some of the variation in findings from different researchers. Together, these results suggest that remote stimulus delivery may be the best way to eliminate variation caused by experimenter presence while making it easier to compare with data from researchers in your lab and others.”

      (3) The in vivo BLA calcium imaging data feel out of place in this manuscript. Is the point of Figure 6 to illustrate how the ARM can be coupled to Inscopix (or other external inputs) software? If yes, the following should be addressed: why do the up-regulated and down-regulated cell activities start increasing/decreasing before the "event" (i.e., stimulus application) in Figure 6F? Why are the paw withdrawal latencies and paw distanced travelled values in Figures 6I and 6J respectively so much faster/shorter than those illustrated in Figure 5 where the same approach was used?

      Thanks to the reviewer for bringing up this concern. We have included further text discussing this behavioral data and how it compares to previous work in this study.

      “Paw height and paw velocity were found to be consistent with data from figures 4E-I (male researcher and male mice) and 5C (stimulus intensity 2.5 and 4.5) for similar data, with slightly elevated measures of paw distance traveled and decreased paw withdrawal latency for the pinprick stimulus. This was likely caused by sensitization due to multiple stimulus deliveries over the course of the experiment, as due to logistics, 30 stimulus trials were delivered per session due to logistical constraints vs the max of 3 that were performed during previous experiments.”

      “This data indicates that the ARM is an effective tool for efficiently correlating in vivo imaging data with evoked behavioral data, including sub-second behavior. One limitation is that the neural response appears to begin slightly before stimulus impact (Figure 6F, 6SB). This was likely caused by a combination of the imprecise nature of ARM v1 paw contact detection and slight delays in the paw contact signal reaching the Inscopix device due to flaws in the software and hardware used, slowing down the signal. Improvements have been made to eliminate this delay as part of the ARM v2, which have been shown to eliminate this delay in vivo fiber photometry data recorded as part of new projects using the device.”

      (4) Another advance of this manuscript is the integration of a 500 fps camera (as opposed to a 2000 fps camera) in the PAWS platform. To convince readers that the use of this more accessible camera yields similar data, a comparison of the results for cotton swabs and pinprick should be completed between the 500 fps and 2000 fps cameras. In other words, repeat Supplementary Figure 3 with the 2000 fps camera and compare those results to the data currently illustrated in this figure.

      The reviewer makes a good point about the need for direct comparison between 500 fps and 2000 fps data. To address this we added data from same mice, from 2 weeks prior with a comparable set up. These new results can be found in Supplemental Figure 3.

      “Changes were made to PAWS to make it compatible with framerates lower than 2000 fps. This was tested using a 0.4 MP, 522 FPS, Sony IMX287 camera recording at 500 fps, and data recorded at 2000 fps by the previously used photron fastcam (Figure 3SC-F). The camera paired with PAWS was found to be sufficient to separate between cotton swab and pinprick withdrawal responses, suggesting it may be a useful tool for labs that cannot invest in a more expensive device. PAWS features measured from 500 fps video data were not significantly different from the 2000 fps data based on a 2 way ANOVA.”

      (5) In Figure 2F, the authors demonstrate that a von Frey experiment can be completed much faster with the ARM vs. manually. I don't disagree with that fact - the data clearly show this. I do, however, wonder if the framing of this feature is perhaps too positive; many labs wait > 30 s between von Frey filament applications to prevent receptive field sensitization. The fact that an entire set of ten filaments can be applied in < 50 s (< 3 s between filaments given that each filament is applied for 2 s), while impressive, may never be a feature that is used in a real experiment.

      The reviewer makes an important point about how different researchers perform these tests and the relevant timings. We have moderated the framing of these results to address this concern.

      “Further, we found that the ARM decreased the time needed to apply a stimulus 10 times to a mouse paw by 50.9% compared to manual delivery (Figure 2F). This effect size may decrease for researchers who leave longer delays between stimulus delivery, but the device should still speed up experiments by reducing aiming time and allowing researchers to quickly switch to a new mouse while waiting for the first.”

      (6) Why are different affective aspects of the hindpaw withdrawal shown in different figures? For example, the number of paw shakes is shown in Figure 3C, whereas paw shaking duration is shown in Figure 5D. It would be helpful - and strengthen the argument for either of these measures as being a reproducible, reliable measure of pain - if the same measure was used throughout.

      Thanks to the reviewer for pointing out this discrepancy. We have adjusted the figures and text to only use the Number of Paw Shakes for better consistency (Figure 5D and Figure 5-figure supplement 1C).

      (7) Is the distance the paw traveled an effective feature of the paw withdrawal (Figure 5E)? Please provide a reference that supports this statement.

      A relevant citation and discussion of this metric based on previous studies has been added.

      “Mice injected with carrageenan (n=15) showed elevated shaking behavior (p=0.0385) in response to pinprick stimuli in comparison to measurements at baseline (Figure 3C). This aligned with previous findings where PAWS has detected elevations in shaking and/or guarding behavior, examples of affective pain behavior, and post-peak paw distance traveled, which correlates with these behaviors in carrageenan pain models and has been to found to be a good measure of them in past studies (Bohic et al. 2023).”

      (8) Dedek et al. (PMID: 37992707) recently developed a similar robot that can also be used to deliver mechanical stimuli. The authors acknowledge this device's ability to deliver optogenetic and thermal stimuli but fail to mention that this device can deliver mechanical stimuli in a similar manner to the device described in this paper, even without experimenter targeting. Additional discussion of the Dedek et al. device is warranted.

      We would like to thank the reviewer for identifying  this omission. Discussion of this as well as further discussion of Dedek et al.’s automation prototyping work has been added.

      “Previous attempts at automating mechanical stimulus delivery, including the electronic von Frey (Martinov 2013) and dynamic plantar asthesiometer (Nirogi 2012), have focused on eliminating variability in stimulus delivery. In contrast to the ARM, both of these devices rely upon a researcher being present to aim or deliver the stimulus, can only deliver vFH-like touch stimuli, and only measure withdrawal latency/force threshold. Additionally, progress has been made in automating stimulus assays by creating devices with the goal of delivering precise optogenetic and thermal stimuli to the mouse’s hind paw (Dedek 2023, Schorscher-Petchu 2021). The Prescott team went farther and incorporated a component into their design to allow for mechanical stimulation but this piece appears to be limited to a single filament type that can only deliver a force ramp. As a result these devices and those previously discussed lack of customization for delivering distinct modalities of mechanosensation that the ARM allows for. Moreover, in its current form the automated aiming of some of these devices may not provide the same resolution or reliability of the ARM in targeting defined targets (Figure 1C), such as regions of the mouse paw that might be sensitized during chronic pain states. Due to the nature of machine learning pose estimation, substantial work, beyond the capacity of a single academic lab, in standardizing the mouse environment and building a robust model based on an extensive and diverse training data set will be necessary for automated aiming to match the reliability or flexibility of manual aiming. That said, we believe this work along with that of that of the other groups mentioned has set the groundwork from which a new standard for evoked somatosensory behavior experiments in rodents will be built.”

      (9) Page 2: von Frey's reference year should be 1896, not 1986.

      This typo has been fixed, thanks to the reviewer for noting it.

      “For more than 50 years, these stimuli have primarily been the von Frey hair (vFH) filaments that are delivered to the mouse paw from an experimenter below the rodent aiming, poking, and subsequently recording a paw lift (von Frey 1896, Dixon 1980, Chaplan 1994).”

      (10) Page 2: Zumbusch et al. 2024 also demonstrated that experimenter identification can impact mechanical thresholds, not just thermal thresholds.

      Text has been updated in order to note this important point.

      “A meta-analysis of thermal and mechanical sensitivity testing (Chesler 2002, Zumbusch 2024) found that the experimenter has a greater effect on results than the mouse genotype, making data from different individual experimenters difficult to merge.”

      (11) Page 2: One does not "deliver pain in the periphery". Noxious stimuli or injury can be delivered to the periphery, but by definition, pain is a sensation that requires a central nervous system.

      Text has been updated for improved accuracy.

      “Combining approaches to deliver painful stimuli with techniques mapping behavior and brain activity could provide important insights into brain-body connectivity that drives the sensory encoding of pain.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models. 

      We thank the reviewer for the positive comments about our work. We agree that further work is needed in the field of substitution models of molecular evolution to enable more accurate predictions of specific amino acid sequences in evolutionary processes.

      Major concerns: 

      (1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model. 

      The model proposed by Neher, et al. (2014), which incorporates a death rate (d) higher than 0 for any variant, was implemented and applied in the present method. In general, this model did not yield results different from those obtained using the model that assumes d = 1 – b, suggesting that this aspect may not be crucial for the study system. Next, the imposition of arbitrary death events based on an arbitrary death rate could be a point of concern. Regarding the original model, a variant with d = 0 can experience a decrease in fitness through the mutation process. In an evolutionary process, each variant is subject to mutation, and Markov models allow for the incorporation of mutations that decrease fitness (albeit with lower probability than beneficial ones, but they can still occur). All this information is included in the manuscript.

      (2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction. 

      AU: The study protein datasets showed different levels of sequence divergence over their evolutionary times, as indicated for each dataset in the manuscript. For some metrics, we evaluated the accuracy (or error) of the predictions through direct comparisons between real and predicted protein variants using percentages to facilitate interpretation: 0% indicates a perfect prediction (no error), while 100% indicates a completely incorrect prediction (total error). Regarding normalization of these evaluations, we respectfully disagree with the suggestion because diverse factors can affect (not only the substitution rate, but also the sample size, structural features of the protein that may affect stability when accommodating different sequences, among others) and this complicates defining a consistent and meaningful normalization criterion. Given that the manuscript provides detailed information for each dataset, we believe that the presentation of the prediction accuracy through direct comparisons between real and predicted protein variants, expressed as percentages of similarity, is the clearest way.

      (3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space. 

      We agree with this reflection. Mutations can have different effects on folding stability, which are accounted for by the model presented in this study. However, accurately predicting the exact sequences of protein variants with similar stability remains difficult with current structurally constrained substitution models, and therefore, further work is needed in this regard. This aspect is indicated in the manuscript.

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.

      Reviewer #2 (Public review): 

      In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

      The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns. 

      Correct. We thank the reviewer for the positive comments about our study.

      However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV). 

      There are several points in this comment.

      The presented method accurately predicts folding stability of forecasted variants, as shown through comparisons between real and predicted protein variants. However, as the reviewer correctly indicates, predicting the exact amino acid sequences remains challenging. This limitation is discussed in detail in the manuscript, where we also suggest that further improvements in substitution models of protein evolution are needed to better capture the evolutionary signatures of amino acid change at the sequence level, even between amino acids with similar physicochemical properties. Regarding the time points used for validation, the studied influenza NS1 dataset included two validation points. A key limitation in increasing the number of time points is the scarcity of datasets derived from monitoring protein evolution with sufficient molecular diversity between samples collected at consecutive time points (i.e., at least more than five polymorphic amino acid sites). 

      As described in the manuscript, calculating Kullback-Leibler (KL) divergence requires more than one sequence per studied time point. However, most datasets in the literature include only a single sequence per time point, typically a consensus sequence derived from bulk population sequencing. Generating multiple sequences per time point is experimentally more demanding, often requiring advanced methods such as single-virus sequencing or amplification of sublineages in viral subpopulations, as was done for the first dataset used in the study (Arenas, et al. 2016), which enabled the calculation of KL divergence. The extent to which the simulated sequences resemble real evolution is evaluated in the method validation. As noted, intermediate time point validation was performed using the influenza NS1 protein dataset. Although, as the reviewer indicates, thousands of viral sequences are available, these are usually consensus sequences from bulk sequencing. Indeed, many viral variants mainly differ through synonymous mutations, where the number of accumulated nonsynonymous mutations is small. For example, from the original Wuhan strain to the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively.

      Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. In addition, many available viral sequences are not consecutive in evolutionary terms (one dataset does not represent the direct origin of another dataset at a subsequent time point), which further limits their applicability in this study. There is little data from monitored protein evolution with consecutive samples. The most suitable studies usually involve in vitro virus evolution, but the data from these studies often show low genetic variability between samples collected at different time points. Finally, it is important to note that the presented method can only be applied to proteins with known 3D structures, as it relies on selection based on folding stability. Non-structural proteins cannot be analyzed using this approach. Future work could incorporate additional selection constraints, which may improve the accuracy of predictions. These considerations and limitations are indicated in the manuscript.

      The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified. 

      The viral proteins included in the study were selected based on two main criteria, general interest and data availability. In particular, we included proteins from viruses that affect humans and for which data from monitored protein evolution, with sufficient molecular diversity between consecutive time points, is available. These aspects are indicated in the manuscript.

      The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure. 

      The manuscript indicates the sequence identity among protein datasets of different time points, along with other technical details. Next, the evaluation based on comparisons between simulated and real sequences reflects how surprising the simulated sequences might be relative to natural diversity, considering that the real dataset is representative. We believe that the diverse study real datasets are useful to evaluate the accuracy of the method in predicting different molecular patterns. Regarding the use of consensus sequences, we agree that they provide an approximation. However, as previously indicated, most of the available data from monitored protein evolution consist of consensus sequences obtained through bulk sequencing. Additionally, analyzing every individual viral sequence within a viral population, which is typically large, would be ideal but computationally intractable.

      The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

      The applied fitness function, based on absolute ΔG, is well stablished in the field (Sella and Hirsh 2005; Goldstein 2013). The present study independently predicts ΔG for the real and simulated protein variants at each sampling point. This ΔG prediction accounts not only for negative design, informed by empirical data, but also for positive design based on the study data (Arenas, et al. 2013; Minning, et al. 2013), thereby enabling the detection of variation in folding stability among protein variants. These aspects are indicated in the manuscript. Therefore, in our view, the study provides a proper comparison of real and predicted evolutionary trajectories in terms of folding stability.

      Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time. 

      The presented method estimates the fitness of each protein variant, which can reflect the relative survival capacity of the variant. Therefore, despite the error due to evolutionary constraints not considered by the method, it indicates which variants are more likely to become fixed over time. In our view, the method does not merely filter plausible variants, rather, it generates predictions of variant survival through predicted fitness based on folding stability and simulations of protein evolution under structurally constrained substitution models integrated with birth-death population genetics approaches. The use of simulation-based approaches for prediction is well established in population genetics. For example, approaches such as approximate Bayesian computation (Beaumont, et al. 2002) rely on this strategy, and it has also been applied in other studies of forecasting evolution (e.g., Neher, et al. 2014). We believe that the distinction between forecasting folding stability and amino acid sequence is clearly shown in the manuscript, including the main text and the figures.

      Reviewer #2 (Recommendations for the authors): 

      I thank the authors for addressing the question about template switching, their clarification was helpful. However, the core concerns I raised remain unresolved: the claim that the method is useful for forecasting is not substantiated.  In order to support the paper's central claims or to prove its usefulness, several key improvements could be incorporated: 

      (1) Systematic analysis of more proteins: 

      The manuscript would be significantly strengthened by a systematic evaluation of model performance across a broader set of viral proteins, beyond the examples currently shown. Many human influenza and SARS-CoV-2 proteins have wellcharacterized structures or high-quality homology templates, making them suitable candidates. In the light of limited success of the method, presenting the model's behavior across a more comprehensive protein set, including those with varying structural constraints and immune pressures, would help assess generalizability and clarify the specific conditions under which the model is applicable. 

      Following a comment from the reviewer in a previous revision of the study, we included the analysis of an influenza NS1 protein dataset that contains two evaluation time points. Next, to validate the prediction method, it is necessary to have monitored protein sequences collected at least at two consecutive time points, with sufficient divergence between them to capture evolutionary signatures that allow for proper evaluation. Additionally, many data involve sequences that are not consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. Little data from monitored protein evolution with trustable consecutive (ancestor-descendant) samples exist. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points. Although thousands of sequences are available for some viruses, they are usually consensus sequences from bulk sequencing and often show a low number of nonsynonymous mutations at the study protein-coding gene between time points. For example, from the original Wuhan strain and the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively. Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. Thus, in practice, we found scarcity of data derived from monitoring protein evolution, with trustable ancestor and corresponding descendant data at consecutive time points and with sufficient molecular diversity between them (i.e., at least more than five polymorphic amino acid sites). In all, we believe that the diverse viral protein datasets used in the present study, along with the multiple analyzed datasets collected from monitored HIV-1 populations present in different patients, provide a representative application of the method, since notice that similar patterns were generally generated from the analysis of the different datasets.

      (2) Present clear data statistics: For each analyzed dataset, the authors should provide basic information about the number of unique sequences, levels of variability, and evolutionary divergence between start and end sequences. This would contextualize the forecasting task and clarify whether the simulations are non-trivial. In particular, it should be shown that the consensus sequence is indeed representative of the viral population at a given time point. In viral evolution we frequently observe co-circulation of subclades and the consensus sequence is then not representative. 

      For each dataset analyzed, the manuscript provides the sequence identity between samples at the study time points (which also informs about sequence variability), sample sizes, representative protein structure, and other technical details. The study assumes that consensus sequences, typically generated by bulk sequencing, are representative of the viral population. Next, samples at different time points should involve ancestor-descendant relationships, which is a requirement and one of the limitations to find appropriate data for this study, as noted in our previous response.

      (3) Explore other metrics for population level sequence comparison: 

      In the light of possible existence of subclades, mentioned above, the currently used metrics for sequence comparison may underestimate performance of the simulations. It would be sufficient to see some overlap of simulated clades and and the observed clades. 

      We found this to be a good idea. However, in practice, we believe that the criteria used to define subclades could introduce biases into the results. For some metrics, we evaluated the accuracy of the predictions through direct comparisons between all real and predicted protein variants, using percentages to facilitate interpretation. We believe that using subclades could potentially reduce the current prediction errors, but this would complicate the interpretation of the results, as they would be influenced by the subjective criteria used to define the subclades.

      Currently, the manuscript presents a plausible filtering framework rather than a predictive model. Without these additional analyses, the main claims remain only partially supported. 

      Please see our reply to the comment of the reviewer just before the section titled “Recommendations for the authors”.

      Response to some rebuttal statements: 

      (1) "Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016)" 

      The available Influenza and SARS-CoV-2 data gathers isolates annotated with exact collection dates, providing reach datasets for such analysis. 

      The available influenza and SARS-CoV-2 sequences are typically derived from bulk sequencing and, therefore, they are consensus sequences. As a result, they cannot be used to calculate KL divergence. Additionally, many of the indicated sequences from databases are not demonstrated to be consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points.

      (2) "Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is  required to properly evaluate the prediction method." 

      There have been many more variants of concern subsequent to Omicron which circulated in 2021. 

      A key aspect is the accumulation of diversity in the study proteins across different time points. The SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes from the original Wuhan variant to Omicron, respectively.

      Analyzing intermediate variants of concern (e.g., Gamma or Delta) or those closely related to Omicron would reduce the number of accumulated mutations even further.   

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model. 

      AU: We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging. In this revised version, where we added additional real data, we found that the accuracy of this prediction can vary among proteins (i.e., the SCS model was more accurate than the neutral model in predicting sequences of the influenza NS1 protein at different time points). Still, we consider that efforts are required in the field of substitution models of molecular evolution. For example, amino acids with similar physicochemical properties can result in predictions with appropriate folding stability while different specific sequence. The development of accurate substitution models of molecular evolution is an active area of research with ongoing progress, but further efforts are still needed. Next, forecasting the folding stability of future real proteins is fundamental for proper forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify them in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birthdeath model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny. 

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later another model, derived from the proposal of the reviewer, that we have now implemented into the framework and applied it to the study data), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this affects the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution as Markov models follows general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We have now provided a more detailed description of the models in the manuscript.

      Apart from these clarifications about the birth-death model used, we could understand the point of the reviewer and following the suggestion we have now incorporated an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we followed the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate can vary among lineages. We implemented this model into the computer framework and applied it to the data used for the evaluation of the models. The results indicated that, in general, this model yields similar predictive accuracy compared to the previous birth-death model. Thus, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We have now presented this additional birth-death model and its results in the manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1). 

      As indicated in our previous answer, our study shows a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications.

      Next, also as indicated in our previous response, the birth-death model used in this study accounts for variation in fitness among lineages producing variable reproductive success. The additional birth-death model that we have now incorporated, which considers variation of the global birth-death rate among lineages, produced similar prediction accuracy, suggesting a limited role in protein evolution modeling. Molecular evolution parameters, particularly the substitution model, appear to be more critical in this regard. We have now included these aspects in the manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny. 

      In the present study, we compared the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitution events over time. Therefore, to compare the neutral and SCS models in terms of evolutionary inference, an evolutionary time is required, in this case it is provided by the birth-death process. Thus, the cases 1) and 2) cannot be compared without an underlined evolutionary history. Next, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models outperformed models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results obtained in the present study where we explored the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant finding, folding stability is fundamental to protein function and has a variety of applications. We have now indicated these aspects in the manuscript.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work. 

      AU: This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is distributed with the framework, and it can be updated to incorporate new structures (further details are provided in the distributed framework documentation and practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins from a database to improve the predictions), thus incorporating background molecular diversity. We have now indicated this important aspect in the manuscript. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may affect the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We have now mentioned this aspect in the manuscript.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution. 

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of structural protein (Goldstein 2013), making it broadly applicable. Actually, in this revised version we added the analysis of additional data of another protein (influenza NS1 protein) with predictions at different time points. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birthdeath models. Rather, we aim to explore the integration of a standard birth-death model with SCS models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and the presented combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this evolutionary system. We have now indicated these considerations in the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and coauthors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2. 

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We thank the reviewer for the positive comments on our study. Regarding the predictive power, the results showed good accuracy in predicting the folding stability of the forecasted protein variants. In this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. Still, we believe that further efforts are required in the field in improving the accuracy of substitution models of molecular evolution. Altogether, accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Also, we implemented the models into a freely available computer framework, with detailed documentation and a variety of practical examples.

      Strengths: 

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints. 

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses: 

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported. 

      Our study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications. We have now expanded these aspects in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability. 

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding future (forecasted) variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune responses. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic diversity between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate or apply forecasting evolution. These aspects are indicated in the manuscript. Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). This aspect is now more clearly indicated in the manuscript. Regarding the Omicron datasets, we used 384 curated sequences of the Omicron variant of concern to construct the study data and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. Actually, we noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID. Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations with predictions for up to four different time points. Apart from those aspects, following the proposal of the reviewer, we have now incorporated the analysis of an additional dataset of influenza NS1 protein (Bao, et al. 2008), with predictions for two different time points, to further assess the generalizability of the method. We have now included details of this influenza NS1 protein dataset and the predictions derived from it in the manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study does not aim to investigate the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which provides an important evaluation of the prediction method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse protein data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We have now included in a supplementary table details about the fitting of the structural templates. Indeed, our proposal assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur and, in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Abstract: "expectedly, the errors grew up in the prediction of the corresponding sequences" <- Not entirely clear what is meant by "errors grew up" or what the errors grew with.

      This sentence refers to the accuracy of sequence prediction in comparison to that of folding stability prediction. We have now clarified this aspect in the manuscript.

      (2) Lines 162-165: "Alternatively, if the fitness is determined based on the similarity in folding stability between the modeled variant and a real variant, the birth rate is assumed to be 1 minus the root mean square deviation (RMSD) in folding stability." <- What is the biological motivation for using the RMSD? It seems like a more stable variant would always have higher fitness, at least according to Equation 1.

      RMSD is commonly used in molecular biology to compare proteins in terms of structural distance, folding stability, kinetics, and other properties. It offers advantages such as minimizing the influence of small deviations while amplifying larger differences, thereby enhancing the detection of remarkable molecular changes. Additionally, RMSD would facilitate the incorporation of other biophysical parameters, such as structural divergences from a wild-type variant or entropy, which could be informative for fitness in future versions of the method. We have now included this consideration in the manuscript.

      (3) Lines 165-166: "In both cases, the death rate (d) is considered as 1-b to allow a constant global (birth-death) rate" <- This would give a constant R = b / (1-b) over the entire phylogenetic tree. For applications to pathogens like viruses with epidemic dynamics, this is extremely implausible. Is there any need to make such a restrictive assumption? 

      Regarding technical considerations of the model, we refer to our answer to the first public review comment. Next, a constant global rate of evolution was observed in numerous genes and proteins of diverse organisms, including viruses (Gojobori, et al.1990; Leitner and Albert 1999; Shankarappa, et al. 1999; Liu, et al. 2004; Lu, et al. 2018; Zhou, et al. 2019). However, following the comment of the reviewer, and as we indicated in our answer to the first public review comment, we have now implemented and evaluated an additional birth-death model that allows for variation in the global birth-death rate among lineages. We have implemented this additional model in the framework and described it along with its results in the manuscript.

      (4) Lines 187-188: "As a consequence, since b+d=1 at each node, tn is consistent across all nodes, according to (Harmon, 2019)." <- This would also imply that all lineages have a growth rate r = b - d, which under a birth-death model is equivalent to saying all lineages have the same fitness! 

      We clarified this aspect in our answer to the first public review comment. In particular, in the model presented, protein variants with higher fitness have higher birth rates, leading to more birth events, while protein variants with lower fitness have lower birth rates leading to more extinction events, which presents biological meaning for the study system. In our model b and d can vary among lineages according to the corresponding fitness (i.e., a lineage may have b=0.9, d=0.1, r=0.8; while another one may have b=0.6, d=0.4, r=0.2). Since the reproductive success varies among lineages in our model, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect, although it could be interpreted like that in certain models. Fitness affects reproductive success, but fitness and growth rate of evolution are different biological processes (despite a faster growth rate can sometimes be associated with higher fitness, a variant with a high fitness not necessarily has to accumulate substitutions at a higher rate). An example in molecular adaptation studies is the traditional nonsynonymous to synonymous substitution rates ratio (dN/dS), where dN/dS (that informs about selection derived from fitness) can be constant at different rates of evolution (dN and dS). In any case, we thank the reviewer for raising this point, which led us to incorporate an additional birth-death model and inspired some ideas.  Thus, following the comment of the reviewer and as indicated in the answer to the first public review comment, we have now implemented and evaluated an additional birthdeath model that allows for variation in the global birth-death rate among lineages. The results indicated that this model yields similar predictive accuracy compared to the previous birth-death model. We have now included these aspects, along with the results from the additional model, in the manuscript.

      (5) Line 321-322: "For the case of neutral evolution, all protein variants equally fit and are allowed, leading to only birth events," <- Why would there only be birth events? Lineages can die regardless of their fitness. 

      AU: In the neutral evolution model, all protein variants have the same fitness, resulting in a flat fitness landscape. Since variants are observed, we allowed birth events. However, it assumed the absence of death events as no information independent of fitness is available to support their inclusion and quantification, thereby avoiding the imposition of arbitrary death events based on an arbitrary death rate. We have now provided a justification of this assumption in the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Clarify the purpose of the alternative fitness mode ("ΔG similarity to a target variant"): 

      The manuscript briefly introduces an alternative fitness function based on the similarity of a simulated protein's folding stability to that of a real protein variant, but does not provide a clear motivation, usage scenario, or results derived from it. 

      The presented model provides two approaches for deriving fitness from predicted folding stability. The simpler approach assumes that a more stable protein variant has higher fitness than a less stable one. The alternative approach assigns high fitness to protein variants whose stability closely matches observed stability, acknowledging that the real observed stability is derived from the real selection process, and this approach considers negative design by contrasting the prediction with real information. For the analyses of real data in this study, we used the second approach, guided by these considerations. We have now clarified this aspect in the manuscript.

      (2) Report structural template quality and modeling confidence: 

      Since folding stability (ΔG) estimates rely on structural models derived from homology templates, the accuracy of these predictions will be sensitive to the choice and quality of the template structure. I recommend that the authors report, for each protein modeled, the template's sequence identity, coverage, and modeling quality scores. This will help readers assess the confidence in the ΔG estimates and interpret how template quality might impact simulation outcomes. 

      We agree with the reviewer and we have now included additional information in a supplementary table regarding sequence identity, modeling quality and coverage of the structural templates for the proteins that required homology modeling. The selection of templates was performed using the well-established framework SWISS-MODEL and the best-fitting template was chosen. Next, a large number of protein structures are available in the PDB for the study proteins, which favors the accuracy of the homology modeling. For some datasets, homology modeling was not required, as the modeled sequence was already present in an available protein structure. We have now included this information in the manuscript and in a supplementary table.

      (3) Clarify whether structural remodeling occurs during simulation: 

      It appears that folding stability (ΔG) for all simulated protein variants is computed by mapping them onto a single initial homology model, without remodeling the structure as sequences evolve. If correct, this should be clearly stated, as it assumes that the structural fold remains valid across all simulated variants. A discussion on the potential impact of structural drift would be welcome.

      We agree with the reviewer. As indicated in our answer to a previous comment, our method assumes that the protein structure is maintained over the studied evolutionary time, which is generally acceptable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). At longer timescales the protein structure could change, requiring the modeling of structural evolution over the evolutionary time. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, can be promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real datasets with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lysosomal damage is commonly found in many diseases including normal aging and age-related disease. However, the transcriptional programs activated by lysosomal damage have not been thoroughly characterized. This study aimed to investigate lysosome damage-induced major transcriptional responses and the underlying signaling basis. The authors have convincingly shown that lysosomal damage activates a ubiquitination-dependent signaling axis involving TAB, TAK1, and IKK, which culminates in the activation of NF-kB and subsequent transcriptional upregulation of pro-inflammatory genes and pro-survival genes. Overall, the major aims of this study were successfully achieved.

      Strengths:

      This study is well-conceived and strictly executed, leading to clear and well-supported conclusions. Through unbiased transcriptomics and proteomics screens, the authors identified NF-kB as a major transcriptional program activated upon lysosome damage. TAK1 activation by lysosome damage-induced ubiquitination was found to be essential for NF-kB activation and MAP kinase signaling. The transcriptional and proteomic changes were shown to be largely driven by TAK1 signaling. Finally, the TAK1-IKK signaling was shown to provide resistance to apoptosis during lysosomal damage response. The main signaling axis of this pathway was convincingly demonstrated.

      Weaknesses:

      One weakness was the claim of K63-linked ubiquitination in lysosomal damage-induced NF-kB activation. While it was clear that K63 ubiquitin chains were present on damaged lysosomes, no evidence was shown in the current study to demonstrate the specific requirement of K63 ubiquitin chains in the signaling axis being studied. Clarifying the roles of K63-linked versus other types of ubiquitin chains in lysosomal damage-induced NF-kB activation may improve the mechanistic insights and overall impact of this study.

      Another weakness was that the main conclusions of this study were all dependent on an artificial lysosomal damage agent. It will be beneficial to confirm key findings in other contexts involving lysosomal damage.

      We would like to thank Reviewer #1 for the positive and constructive comments on our study. For a main concern regarding the molecular mechanism by which TAB proteins are activated in response to lysosomal damage, we have added the experimental results to support that the lysosomal accumulation of K63 ubiquitin chains serves as a trigger to activate the TAB-TAK1 pathway. We also investigated and discussed the role of LUBAC-mediated M1 ubiquitin chains in NF-kB activation and the effects of other lysosomal-damaging compounds. Please see the response to “Reviewer #3 (Public review): Suggestions:”.

      Reviewer #2 (Public review):

      Summary:

      Endo et al. investigate the novel role of ubiquitin response upon lysosomal damage in activating cellular signaling for cell survival. The authors provide a comprehensive transcriptome and proteome analysis of aging-related cells experiencing lysosomal damage, identifying transcription factors involved in transcriptome and proteome remodeling with a focus on the NF-κB signaling pathway. They further characterized the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis in controlling gene expression, inflammatory responses, and apoptotic processes.

      Strengths:

      In the aging-related model, the authors provide a comprehensive transcriptome and characterize the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis. Through compelling experiments and advanced tools, they elucidate its critical role in controlling gene expression, inflammatory responses, and apoptotic processes.

      Weaknesses:

      The study lacks deeper connections with previous research, particularly:

      • The established role of TAB-TAK1 in AMPK activation during lysosomal damage

      • The potential significance of TBK1 in NF-κB signaling pathways

      We would like to thank Reviewer #2 for the helpful comments on our study. To achieve a more comprehensive understanding of the signaling pathways involved in the lysosomal damage response, we investigated additional related signal mediators, such as TBK1 and LUBAC. The citations related to AMPK have been incorporated.

      Reviewer #3 (Public review):

      Summary:

      The response to lysosomal damage is a fast-moving and timely field. Besides repair and degradation pathways, increasing interest has been focusing on damaged-induced signaling. The authors conducted both transcriptomics and proteomics to characterize the cellular response to lysosomal damage. They identify a signaling pathway leading to activation of NFkappaB. Based on this and supported by Western blot and microscopy data, the authors nicely show that TAB2/3 and TAK1 are activated at damaged lysosomes and kick off the pathway to alter gene expression, which induces cytokines and protect from cell death. TAB2/3 activation is proposed to occur through K63 ubiquitin chain formation. Generally, this is a careful and well conducted study that nicely delineates the pathway under lysosomal stress. The "omics" data serves as a valuable resource for the field. More work should be invested into how TAB2/3 are activated at the damaged lysosomes, also to increase novelty in light of previous reports.

      Strengths:

      Generally, this is a careful and well-conducted study that nicely delineates the pathway under lysosomal stress. The "omics" data serves as a valuable resource for the field.

      Weaknesses:

      More work should be invested into how TAB2/3 are activated at the damaged lysosomes, also to increase novelty in light of previous reports. Moreover, different damage types should be tested to probe relevance for different pathophysiological conditions.

      We would like to thank Reviewer #3 for the valuable comments on our study. We have added the experimental results to address two concerns raised by Reviewer #3. Please see the response to “Reviewer #3 (Public review): Suggestions:”.

      Suggestions:

      (1) A recent paper claims that NFkappaB is activated by Otulin/M1 chains upon lysosome damage through TBK1 (PMID: 39744815). In contrast, Endo et al. nicely show that ubiquitylation is needed (shown by TAK-243) for NFkB activation but only have correlative data to link it specifically to K63 chains. On page 15, line 11, the authors even argue a "potential" involvement of K63. This point should be better dealt with. Can the authors specifically block K63 formation? K63R overexpression or swapping would be one way. Is the K63 ligase ITCH involved (PMID: 38503285) or any other NEDD4-like ligase? This could be compared to LUBAC inhibition. Also, the point needs to be dealt with more controversially in the discussion as these are alternative claims (M1 vs K63, TAB vs TBK1).

      It is well-characterized that the NZF domain of TAB proteins preferentially associates with K63-linked ubiquitin chains. Therefore, we performed the add-back experiment using siRNA-resistant TAB2 WT and mutants incapable of binding to K63-linked ubiquitin chains, dNZF and E685A, to elucidate the requirement of K63 ubiquitin chains for TAK1 activation. We investigated whether the add-back of TAB2 mutants rescues the activation of TAK1 in TAB2-depleted cells (Fig. 2E). TAB2 WT, but not dNZF and E685A, rescued TAK1 activation in response to LLOMe, suggesting that the specific interaction of TAB proteins and K63 ubiquitin chains is a key mechanism to activate TAK1. We also found that the treatment of an E1 inhibitor TAK-243 effectively prevented the lysosomal accumulation of K63 ubiquitin chains, but TAB2 was recruited to damaged lysosomes (Fig. S2B). This suggests that the recruitment of TAB proteins to damaged lysosomes is independent of the association with K63 ubiquitin chains. Collectively, it is postulated that TAB proteins require interaction with K63 ubiquitin chains for TAK1 activation, but not for recruitment to damaged lysosomes. We have added the sentences (p9, lines 7-20, and p10, lines 8-10).

      Next, we confirmed that LUBAC functions are essential for NF-kB activation in the lysosomal damage response. RNF31/HOIP is a component of LUBAC that catalyzes M1 ubiquitination. The depletion of RNF31 showed no significant effects on TAK1 activation, but abolished IKK activation (Fig. S4G). It is well-characterized that LUBAC-mediated M1 ubiquitin chains recruit IKK subunits and transduce the signaling to downstream in the canonical pathway. We assume that K63 ubiquitin chains in damaged lysosomes initially activate TAB-TAK1 and trigger LUBAC-mediated M1 ubiquitination, and subsequently, M1 ubiquitination functions to recruit the IKK complex. Consequently, activated TAK1 phosphorylates IKK subunits in damaged lysosomes, leading to NF-kB activation. We also examined whether TBK1 is involved in the activation of NF-kB. TBK1 was phosphorylated upon LLOMe, and depletion of TAB and TAK1 resulted in a slight reduction of TBK1 phosphorylation (Fig. S4D, E). The treatment of a TBK1 inhibitor BX-795 exhibited no or little effects on TAK1 activation, but abolished phosphorylation of IKK and IkBa (Fig. S4F). These suggest that TBK1 is required for the activation of NF-kB. We have added the sentences (p13, line 13-p14, line 10).

      As mentioned by Reviewer #3, it is important to identify the E3 ligase responsible for K63 ubiquitination in the lysosomal damage response. We have been aiming to identify such E3 ligase(s). However, depletions of ITCH and other E3 ligases that have been tested exhibited no or little effects on K63 ubiquitination and TAK1 activation.  We would like to explore E3 ligase(s) in future study.

      (2) It would be interesting to know what the trigger is that induces the pathway. Lipid perturbation by LLOMe is a good model, but does activation also occur with GPN (osmotic swelling) or lipid peroxidation (oxidative stress) that may be more broadly relevant in a pathophysiological way? Moreover, what damage threshold is needed? Does loss of protons suffice? Can activation be induced with a Ca2+ agonist in the absence of damage?

      To further clarify the initial trigger that induces TAB-TAK1 activation coupled with lysosomal damage, we examined other damage sources, GPN and DC661, which induce hyperosmotic stress and lipid peroxidation in lysosomes, respectively, thereby resulting in lysosomal membrane damage. Under our experimental conditions, the treatment of these compounds did not result in significant accumulation of Gal-3, indicating a reduced level of lysosomal membrane permeabilization compared with LLOMe (Fig. S2C, D), and no or little TAK1 activation was observed (Fig. S2E). TAB proteins require their association with K63 ubiquitin chains for TAK1 activation. It is therefore postulated that the severe lysosomal membrane permeabilization that triggers the formation and cytosolic exposure of K63 ubiquitin chains may be a determinant of TAB-TAK1 activation. In our future work, we would like to examine broad stimulation of lysosomal damage and further elucidate the initial mechanism of TAB-TAK1 activation. We have added the sentences (p9, line 21-p10, line 7).

      (3) The authors nicely define JNK and p38 activation. This should be emphasized more, possibly also in the abstract, as it may contribute to the claim of increased survival fitness.

      We further tested whether the inhibition of JNK affects the anti-apoptotic effect (Fig. S5B). The inhibition of JNK resulted in an increase in the cleaved caspase-3. This suggests that the anti-apoptotic action in the lysosomal damage response requires JNK as well as IKK. We have added the sentences in results to emphasize the pivotal role of stress-induced MAPKs (p15, lines 7-11).

      Reviewer #1 (Recommendations for the authors):

      (1) Although the ubiquitination-TAB-TAK1-IKK axis was previously characterized in other contexts, specific evidence supporting lysosomal recruitment of these components by ubiquitination during lysosome damage would be beneficial.

      We found that the treatment of an E1 inhibitor TAK-243 abolished the lysosomal accumulation of K63 ubiquitin chains, but TAB2 and TAK1 were recruited to damaged lysosomes (Fig. S2B). This suggests that the recruitment of TAB proteins to damaged lysosomes is independent of the association with K63-linked ubiquitin chains. Next, we investigated whether the add-back of TAB2 mutants incapable of binding K63 ubiquitin chains rescues the activation of TAK1 in TAB2-depleted cells (Fig. 2E). K63 ubiquitin binding of TAB2 was essential for TAK1 activation in response to LLOMe. Taken together, it is suggested that TAB proteins require their interaction with K63 ubiquitin chains for TAK1 activation, but not for recruitment to damaged lysosomes. We have added the sentences (p9, lines 7-20, and p10, lines 8-10). Please also see the response to “Reviewer #3 (Public review): Suggestions:”.

      (2) The activation of p38 and JNK by lysosomal damage does not fit well into the main conclusions of the paper, since IKK knockdown was sufficient to block cellular resistance to apoptosis (caspase cleavage in Fig. 5f). Are p38 and JNK also important for cell survival during lysosomal damage?

      We found that the inhibition of JNK resulted in an increase in the cleaved caspase-3, suggesting that the anti-apoptotic action in the lysosomal damage response requires both IKK and JNK (Fig. S5B). We have added the sentences (p15, lines 7-11).

      (3) Cell death tests are recommended to support the conclusions related to apoptosis.

      As suggested by Reviewer #1, we performed the cell death assay using propidium iodide (PI) and confirmed that HeLa cells co-treated with LLOMe and TAK-243 or HS-276 exhibited increased cell death (Fig. 5E). This indicates a direct correlation between the degree of caspase-3 cleavage and cell death, possibly apoptosis.

      (4) Page 8, line 19-21, gal3 is not exposed upon lysosomal damage. It is recruited from the cytosol by the exposed beta-galactoside-containing glycans on lysosomal membrane proteins.

      We have corrected the corresponding sentence (p7, lines 17-20).

      (5) Carefully checking grammar throughout the text is recommended. Below are a few examples:

      a) Page 4, line 10, remove "that".

      b) "K63 ubiquitin" shall be replaced with "K63 ubiquitination" or "K63 ubiquitin chains".

      c) Page 8, line 9, "remain" should be "remains".

      We have carefully checked the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Despite the novelty and significance of these findings in advancing the field, several technical and experimental limitations require further clarification:

      We have responded to each comment. Please see below.

      The manuscript should introduce or discuss previous research showing that TAB-TAK1 facilitates AMPK activation during lysosomal damage and TAK1's increased association with damaged lysosomes (PMID: 31995728).

      We have added the reference (PMID: 31995728) and the sentences (p17, lines 15-20).

      Figure 2A: The differential LAMP1 staining intensity between control and LLOMe-treated cells needs explanation. The weaker LAMP1 signal in control and puncta changes, especially during 5-minute LLOMe treatment, require detailed clarification

      We have added the explanation (p8, lines 17-21).

      Recent literature (PMID: 34585663) reports TBK1 activation during lysosomal damage. The authors should investigate or discuss whether TBK1 potentially contributes to NF-κB signaling in this context.

      We experimentally investigated whether TBK1 is involved in the TAB-TAK1 pathway. We confirmed that TBK1 was activated upon LLOMe (Fig. S4D). Depletions of TAB and TAK1 exhibited a modest decrease in TBK1 phosphorylation (Fig. S4E). The inhibition of TBK1 by BX-795 did not affect TAK1 activation, but abolished phosphorylation of IKK and IkBa (Fig. S4F). This suggests that TBK1 is required for NF-kB activation. We have added the reference (PMID: 34585663) and the sentences (p13, lines 13-21, p14, lines 8-10, and p18, lines 15-20).

      The introduction of lysosomal damage response lacks comprehensive mechanistic information. For example, while ESCRT is discussed, other critical mechanisms such as lipid transfer and stress granule formation in lysosomal repair should be incorporated. Moreover, mTOR and AMPK signaling pathways undergo significant changes upon lysosomal damage.

      We have added the sentences (p3, lines 16-18, and p3, line 21-p4, line 1).

      The statement "lysosomal permeabilization causes the dissociation of mTORC1 from lysosomes" should explicitly reference PMID: 29625033.

      We have added the suggested reference (PMID: 29625033, p4, line 19).

      The claim that "The elimination of damaged lysosomes through lysophagy requires a period of more than half a day" needs a specific publication citation.

      We have added the reference (PMID: 23921551) to claim the time-scale of lysosomal clearance (p4, line 21).

      Figure 1G: The label "WO after 2h" lacks explanation in the figure legend and requires detailed interpretation.

      To simplify the figures, we have deleted the label “WO after 2 h” (Fig. 1G, 3F, 5D, F-J, S4G, S5A). Instead, we have added the explanation in the figure legends (Fig. 1G).

      Reviewer #3 (Recommendations for the authors):

      (1) page 8, line 13: it is recommended to phrase colocalisation "at" damaged lysosomes rather than "in" damaged lysosomes as the resolution does not allow the claim of influx into lysosomes.

      We have corrected the word (p8, line 17).

      (2) page 11, line 22: why is "whereas" used to link two events driven by the same mechanism.

      We have corrected the word (p13, line 8).

    1. Author response:

      We thank the reviewers for their thoughtful and thorough consideration of the work. We appreciate the positive reception they give the work, and plan to address several of the comments with further experiments. To outline that work (and ensure that we are on the right track to addressing those concerns), we summarize the core concerns that prompt new experiments:

      (1) Does the YFP tag on the ACRs interfere with simultaneous GCaMP imaging of RubyACR-expressing cells and could bleaching of the YFP complicate interpretation of the experiments here?

      We will test whether 920 nm (2p) and 650 nm (1p) excitation cause YFP bleaching that interferes with interpretation of inhibitory calcium (i.e. GCaMP) signals. Because the YFP tag enhances opsin sensitivity, we prioritized these tagged RubyACRs for initial characterization. FLAG-tagged ACRs are in progress, but will take time to fully characterize. Considering that the RubyACR-EYFP versions work very well, and in many cases people will want the YFP tag, either for visualizing expression or to maximize sensitivity, we feel the current work is a valuable contribution on its own. Indeed several labs have already requested these lines.

      (2) Are the ACRs activated by two-photon illumination?

      We will examine GCaMP signals at increasing 2p intensities to determine whether imaging unintentionally activates RubyACRs, as well as whether 2p illumination could be used for intentional opsin activation.

      (3) How toxic is the expression of these opsins?

      We will update the quantification of toxicity in Table 1 to include all the drivers we used in this study. In fact the toxicity we observed was primarily with the vGlut driver, which was why that was the only information in the table. The other drivers we used did not appreciably reduce survival rate, but showing the one case where it did have a big effect left a strong and understandably inaccurate impression that toxicity was a big pitfall. We note that the widely used CSChrimson has similar % survival to the RubyACRs when expressed with these vGlut drivers.

      We also plan to examine whether ACR expression leads to cell-autonomous perturbations. We will determine whether expression leads to some frequency of neuronal cell death, and we will evaluate whether any morphological effects occur.

      We will also clarify in the Discussion that potential toxicity may be driver-specific (as it is here) and should be evaluated case-by-case by investigators using the tool.

      (4) Use functional imaging to confirm inhibition of the neurons used only for behavioral experiments (pIP10 & PPL1-γ1pedc)

      We will perform these imaging experiments. One caveat is that inhibition may not be readily detectable with GCaMP, as the resting calcium levels in pIP10 and PPL1-γ1pedc neurons may already be quite low. This differs from the non-spiking Mi1 neurons, where inhibition was clearly observed with GCaMP. For this reason, we consider the behavioral results stronger evidence of efficacy, but we agree that imaging could provide useful supporting evidence, recognizing that a negative result would be difficult to interpret.

      (5) Confirm that the GtACR1 will inhibit locomotion in the flybowl when activated with green light, its spectral peak.

      We will perform this benchmark experiment. Please note that our intention with this study was to find an effective red-light activated opto-inhibitor because these wavelengths are much less perturbing to behavior. In that respect, regardless of GtACR1’s performance with green light, the RubyACRs clearly provide important new tools for Drosophila behavioral neuroscience.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Review of the manuscript titled " Mycobacterial Metallophosphatase MmpE acts as a nucleomodulin to regulate host gene expression and promotes intracellular survival".

      The study provides an insightful characterization of the mycobacterial secreted effector protein MmpE, which translocates to the host nucleus and exhibits phosphatase activity. The study characterizes the nuclear localization signal sequences and residues critical for the phosphatase activity, both of which are required for intracellular survival.

      Strengths:

      (1) The study addresses the role of nucleomodulins, an understudied aspect in mycobacterial infections.

      (2) The authors employ a combination of biochemical and computational analyses along with in vitro and in vivo validations to characterize the role of MmpE.

      Weaknesses:

      (1) While the study establishes that the phosphatase activity of MmpE operates independently of its NLS, there is a clear gap in understanding how this phosphatase activity supports mycobacterial infection. The investigation lacks experimental data on specific substrates of MmpE or pathways influenced by this virulence factor.

      We thank the reviewer for this insightful comment and agree that identification of the substrate of MmpE is important to fully understand its role in mycobacterial infection.

      MmpE is a putative purple acid phosphatase (PAP) and a member of the metallophosphoesterase (MPE) superfamily. Enzymes in this family are known for their catalytic promiscuity and broad substrate specificity, acting on phosphomonoesters, phosphodiesters, and phosphotriesters (Matange et al., Biochem J., 2015). In bacteria, several characterized MPEs have been shown to hydrolyze substrates such as cyclic nucleotides (e.g., cAMP) (Keppetipola et al., J Biol Chem, 2008; Shenoy et al., J Mol Biol, 2007), nucleotide derivatives (e.g., AMP, UDP-glucose) (Innokentev et al., mBio, 2025), and pyrophosphate-containing compounds (e.g., Ap4A, UDP-DAGn) (Matange et al., Biochem J., 2015). Although the binding motif of MmpE has been identified, determining its physiological substrates remains challenging due to the low abundance and instability of potential metabolites, as well as the limited sensitivity and coverage of current metabolomic technologies in mycobacteria.

      (2) The study does not explore whether the phosphatase activity of MmpE is dependent on the NLS within macrophages, which would provide critical insights into its biological relevance in host cells. Conducting experiments with double knockout/mutant strains and comparing their intracellular survival with single mutants could elucidate these dependencies and further validate the significance of MmpE's dual functions.

      We thank the reviewer for the comment. In our study, we demonstrate that both the nuclear localization and phosphatase activity of MmpE are required for full virulence (Figure 3D–E). Importantly, deletion of the NLS motifs did not impair MmpE’s phosphatase activity in vitro (Figure 2F), indicating that its enzymatic function is structurally independent of its nuclear localization. These findings suggest that MmpE functions as a bifunctional protein, with distinct and non-overlapping roles for its nuclear trafficking and phosphatase activity. We have expanded on this point in the Discussion section “MmpE Functions as a Bifunctional Protein with Nuclear Localization and Phosphatase Activity”.

      (3) The study does not provide direct experimental validation of the MmpE deletion on lysosomal trafficking of the bacteria.

      We thank the reviewer for the comment. The role of Rv2577/MmpE in phagosome maturation has been demonstrated in M. tuberculosis, where its deletion increases colocalization with lysosomal markers such as LAMP-2 and LAMP-3 (Forrellad et al., Front Microbiol, 2020). In our study, we found that mmpE deletion in M. bovis BCG led to upregulation of lysosomal genes, including TFEB, LAMP1, LAMP2, and v-ATPase subunits, compared to the wild-type strain. These results suggest that MmpE may regulate lysosomal trafficking by interfering with phagosome–lysosome fusion.

      To further validate MmpE’s role in phagosome maturation, we will perform fluorescence colocalization assays in THP-1 macrophages infected with BCG/wt, ∆mmpE, complemented, and NLS-mutant strains. Co-staining with LAMP1 and LysoTracker will allow us to assess whether the ∆mmpE mutant is more efficiently trafficked to lysosomes.

      (4) The role of MmpE as a mycobacterial effector would be more relevant using virulent mycobacterial strains such as H37Rv.

      We thank the reviewer for the comment. Previously, the role of Rv2577/MmpE as a virulence factor has been demonstrated in M. tuberculosis CDC 1551, where its deletion significantly reduced bacterial replication in mouse lungs at 30 days post-infection (Forrellad et al., Front Microbiol, 2020). However, that study did not explore the underlying mechanism of MmpE function. In our work, we found that MmpE enhances M. bovis BCG survival in both macrophages (THP-1 and RAW264.7) and mice (Figure 2A-B, Figure 6A), consistent with its proposed role in virulence. To investigate the molecular mechanism by which MmpE promotes intracellular survival, we used M. bovis BCG as a biosafe surrogate and this model is widely accepted for studying mycobacterial pathogenesis (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors have characterized Rv2577 as a Fe3+/Zn2+ -dependent metallophosphatase and a nucleomodulin protein. The authors have also identified His348 and Asn359 as critical residues for Fe3+ coordination. The authors show that the proteins encode for two nuclease localization signals. Using C-terminal Flag expression constructs, the authors have shown that the MmpE protein is secretory. The authors have prepared genetic deletion strains and show that MmpE is essential for intracellular survival of M. bovis BCG in THP-1 macrophages, RAW264.7 macrophages, and a mouse model of infection. The authors have also performed RNA-seq analysis to compare the transcriptional profiles of macrophages infected with wild-type and MmpE mutant strains. The relative levels of ~ 175 transcripts were altered in MmpE mutant-infected macrophages and the majority of these were associated with various immune and inflammatory signalling pathways. Using these deletion strains, the authors proposed that MmpE inhibits inflammatory gene expression by binding to the promoter region of a vitamin D receptor. The authors also showed that MmpE arrests phagosome maturation by regulating the expression of several lysosome-associated genes such as TFEB, LAMP1, LAMP2, etc. These findings reveal a sophisticated mechanism by which a bacterial effector protein manipulates gene transcription and promotes intracellular survival.

      Strength:

      The authors have used a combination of cell biology, microbiology, and transcriptomics to elucidate the mechanisms by which Rv2577 contributes to intracellular survival.

      Weakness:

      The authors should thoroughly check the mice data and show individual replicate values in bar graphs.

      We kindly appreciate the reviewer for the advice. We will update the relevant mice data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "Mycobacterial Metallophosphatase MmpE Acts as a Nucleomodulin to Regulate Host Gene Expression and Promote Intracellular Survival", Chen et al describe biochemical characterisation, localisation and potential functions of the gene using a genetic approach in M. bovis BCG and perform macrophage and mice infections to understand the roles of this potentially secreted protein in the host cell nucleus. The findings demonstrate the role of a secreted phosphatase of M. bovis BCG in shaping the transcriptional profile of infected macrophages, potentially through nuclear localisation and direct binding to transcriptional start sites, thereby regulating the inflammatory response to infection.

      Strengths:

      The authors demonstrate using a transient transfection method that MmpE when expressed as a GFP-tagged protein in HEK293T cells, exhibits nuclear localisation. The authors identify two NLS motifs that together are required for nuclear localisation of the protein. A deletion of the gene in M. bovis BCG results in poorer survival compared to the wild-type parent strain, which is also killed by macrophages. Relative to the WT strain-infected macrophages, macrophages infected with the ∆mmpE strain exhibited differential gene expression. Overexpression of the gene in HEK293T led to occupancy of the transcription start site of several genes, including the Vitamin D Receptor. Expression of VDR in THP1 macrophages was lower in the case of ∆mmpE infection compared to WT infection. This data supports the utility of the overexpression system in identifying potential target loci of MmpE using the HEK293T transfection model. The authors also demonstrate that the protein is a phosphatase, and the phosphatase activity of the protein is partially required for bacterial survival but not for the regulation of the VDR gene expression.

      Weaknesses:

      (1)   While the motifs can most certainly behave as NLSs, the overexpression of a mycobacterial protein in HEK293T cells can also result in artefacts of nuclear localisation. This is not unprecedented. Therefore, to prove that the protein is indeed secreted from BCG, and is able to elicit transcriptional changes during infection, I recommend that the authors (i) establish that the protein is indeed secreted into the host cell nucleus, and (ii) the NLS mutation prevents its localisation to the nucleus without disrupting its secretion.

      We kindly appreciate the reviewer for the advice and will include the relevant experiments in the revised manuscript. The localization of WT MmpE and the NLS mutated MmpE will be tested in the BCG infected macrophages.

      Demonstration that the protein is secreted: Supplementary Figure 3 - Immunoblotting should be performed for a cytosolic protein, also to rule out detection of proteins from lysis of dead cells. Also, for detecting proteins in the secreted fraction, it would be better to use Sauton's media without detergent, and grow the cultures without agitation or with gentle agitation. The method used by the authors is not a recommended protocol for obtaining the secreted fraction of mycobacteria.

      We agree with the reviewer and we will further validate the secretion of MmpE using the tested protocol.

      Demonstration that the protein localises to the host cell nucleus upon infection: Perform an infection followed by immunofluorescence to demonstrate that the endogenous protein of BCG can translocate to the host cell nucleus. This should be done for an NLS1-2 mutant expressing cell also.

      We will add this experiment in the revised manuscript.

      (2) In the RNA-seq analysis, the directionality of change of each of the reported pathways is not apparent in the way the data have been presented. For example, are genes in the cytokine-cytokine receptor interaction or TNF signalling pathway expressed more, or less in the ∆mmpE strain?

      We thank the reviewer for pointing this out and fully agree that conventional KEGG pathway enrichment diagrams do not convey the directionality of individual gene expression changes within each pathway. While KEGG enrichment analysis identifies pathways that are statistically overrepresented among differentially expressed genes, it does not indicate whether individual genes within those pathways are upregulated or downregulated.

      To address this, we re-analyzed the expression trends of DEGs within each significantly enriched KEGG pathway. The results show that key immune-related pathways, including cytokine–cytokine receptor interaction, TNF signaling, NF-κB signaling, and chemokine signaling, are collectively upregulated in THP-1 macrophages infected with ∆mmpE strain compared to those infected with the wild-type BCG strain. The full list of DEGs will be provided in the supplementary materials. The complete RNA-seq dataset has been deposited in the GEO database, and the accession number will be included in the revised manuscript.

      (3) Several of these pathways are affected as a result of infection, while others are not induced by BCG infection. For example, BCG infection does not, on its own, produce changes in IL1β levels. As the author s did not compare the uninfected macrophages as a control, it is difficult to interpret whether ∆mmpE induced higher expression than the WT strain, or simply did not induce a gene while the WT strain suppressed expression of a gene. This is particularly important because the strain is attenuated. Does the attenuation have anything to do with the ability of the protein to induce lysosomal pathway genes? Does induction of this pathway lead to attenuation of the strain? Similarly, for pathways that seem to be downregulated in the ∆mmpE strain compared to the WT strain, these might have been induced upon infection with the WT strain but not sufficiently by the ∆mmpE strain due to its attenuation/ lower bacterial burden.

      We thank the reviewer for the comment. We will update qRT-PCR data with the uninfected macrophages as a control in the revised manuscript.

      Wild-type Mycobacterium bovis BCG strain still has the function of inhibiting phagosome maturation (Branzk et al., Nat Immunol, 2014; Weng et al., Nat Commun, 2022). Forrellad et al. previously identified Rv2577/MmpE as a virulence factor in M. tuberculosis and disruption of the MmpE gene impairs the ability of M. tuberculosis to arrest phagosome maturation (Forrellad et al., Front Microbiol, 2020). In our study, transcriptomic and qRTPCR data (Figures 4C and G, S4C) show that deletion of mmpE in M. bovis BCG leads to upregulation of lysosomal biogenesis and acidification genes, including TFEB, LAMP1, and vATPase. To further validate MmpE’s role in phagosome maturation, we will perform fluorescence colocalization assays in THP-1 macrophages infected with BCG/wt, ∆mmpE, complemented, and NLS-mutant strains. Co-staining with LAMP1 and LysoTracker will assess whether the ∆mmpE mutant is more efficiently trafficked to lysosomes.

      Furthermore, CFU assays demonstrated that the ∆mmpE strain exhibits markedly reduced bacterial survival in both human THP-1 and murine RAW264.7 macrophages, as well as in mice, compared to the wild-type strain (Figures 4A and C, 6A). These findings suggest that the loss of MmpE compromises bacterial survival, likely due to enhanced lysosomal trafficking and acidification. This supports previous studies showing that increased lysosomal activity promotes mycobacterial clearance (Gutierrez et al., Cell, 2004; Pilli et al., Immunity, 2012).

      (4) CHIP-seq should be performed in THP1 macrophages, and not in HEK293T. Overexpression of a nuclear-localised protein in a non-relevant line is likely to lead to several transcriptional changes that do not inform us of the role of the gene as a transcriptional regulator during infection.

      We thank the reviewer for the comment. We performed ChIP-seq in HEK293T cells is based on the fact that this cell line is widely used in ChIP-based assays due to its high transfection efficiency, robust nuclear protein expression, and well-annotated genome (Lampe et al., Nat Biotechnol, 2024; Marasco et al., Cell, 2022). These features make HEK293T an ideal system for the initial identification of genome wide chromatin binding profiles of novel nuclear effectors such as MmpE.

      Furthermore, we validated the major observations in THP-1 macrophages, including (i) RNAseq of THP-1 cells infected with either WT BCG or ∆mmpE strains revealed significant transcriptional changes in immune and lysosomal pathways (Figure 4A); (ii) Integrated analysis of CUT&Tag and RNA-seq data identified 298 genes in infected THP-1 cells that exhibited both MmpE binding and corresponding expression changes. Among these, VDR was validated as a direct transcriptional target of MmpE using EMSA and ChIP-PCR (Figures 5E-J, S5D-F). Notably, the signaling pathways associated with MmpE-bound genes, including PI3K-Akt-mTOR signaling and lysosomal function, substantially overlap with those transcriptionally modulated in infected THP-1 macrophages (Figures 4B-G, S4B-C, S5C-D), further supporting the biological relevance of the ChIP-seq data obtained from HEK293T cells.

      (5) I would not expect to see such large inflammatory reactions persisting 56 days postinfection with M. bovis BCG. Is this something peculiar for an intratracheal infection with 1x107 bacilli? For images of animal tissue, the authors should provide images of the entire lung lobe with the zoomed-in image indicated as an inset.

      We thank the reviewer for the comment. The lung inflammation peaked at days 21–28 and had clearly subsided by day 56 across all groups (Figure 6B), consistent with the expected resolution of immune responses to an attenuated strain like M. bovis BCG. This temporal pattern is in line with previous studies using intravenous or intratracheal BCG vaccination in mice and macaques, which also demonstrated robust early immune activation followed by resolution over time (Smith et al., Nat Microbiol, 2025; Darrah et al., Nature, 2020).

      In this study, the infectious dose (1×10⁷ CFU intratracheally) was selected based on previous studies in which intratracheal delivery of 1×10⁷CFU produced consistent and measurable lung immune responses and pathology without causing overt illness or mortality (Xu et al., Sci Rep, 2017; Niroula et al., Sci Rep, 2025). We will provide whole-lung lobe images with zoomed-in insets in the revised manuscript.

      (6) For the qRT-PCR based validation, infections should be performed with the MmpEcomplemented strain in the same experiments as those for the WT and ∆mmpE strain so that they can be on the same graph, in the main manuscript file. Supplementary Figure 4 has three complementary strains. Again, the absence of the uninfected, WT, and∆mmpE infected condition makes interpretation of these data very difficult.

      We thank the reviewer for the comment. As suggested, we will conduct the qRT-PCR experiment including the uninfected, WT, ∆mmpE, Comp-MmpE, and the three complementary strains infecting THP-1 cells. The updated data will be provided in the revised manuscript.

      (7) The abstract mentions that MmpE represses the PI3K-Akt-mTOR pathway, which arrests phagosome maturation. There is not enough data in this manuscript in support of this claim. Supplementary Figure 5 does provide qRT-PCR validation of genes of this pathway, but the data do not indicate that higher expression of these pathways, whether by VDR repression or otherwise, is driving the growth restriction of the ∆mmpE strain.

      We thank the reviewer for the comment. The role of MmpE in phagosome maturation was previously characterized. Disruption of mmpE impairs the ability of M. tuberculosis to arrest lysosomal trafficking (Forrellad et al., Front Microbiol, 2020). In this study, we further found that MmpE suppresses the expression of key lysosomal genes, including TFEB, LAMP1, LAMP2, and ATPase subunits (Figure 4G), suggesting MmpE is involved in arresting phagosome maturation. As noted, the genes in the PI3K–Akt–mTOR pathway are upregulated in ∆mmpE-infected macrophages (Figure S5C).

      To functionally validate this, we will conduct two complementary experimental approaches:

      (i) Immunofluorescence assays: We will assess phagosome maturation and lysosomal fusion in THP-1 cells infected with BCG/wt, ∆mmpE, Comp-MmpE, and NLS mutant strains. Colocalization of intracellular bacteria with LAMP1 and LysoTracker will be quantified to determine whether the ∆mmpE strain is more efficiently trafficked to lysosomes.

      (ii) CFU assays: We will perform CFU assays in THP-1 cells infected with BCG/wt or ∆mmpE in the presence or absence of PI3K-Akt-mTOR pathway inhibitors (e.g., Dactolisib), to assess whether activation of this pathway contributes to the intracellular growth restriction observed in the ∆mmpE strain.

      (8) The relevance of the NLS and the phosphatase activity is not completely clear in the CFU assays and in the gene expression data. Firstly, there needs to be immunoblot data provided for the expression and secretion of the NLS-deficient and phosphatase mutants. Secondly, CFU data in Figure 3A, C, and E must consistently include both the WT and ∆mmpE strain.

      We thank the reviewer for the comment. We will provide immunoblot data for the expression and secretion of the NLS-deficient and phosphatase mutants. Additionally, we will revise Figure 3A, 3C, and 3E to consistently include both the WT and ΔmmpE strains in the CFU assays.

      Reference

      Branzk N, Lubojemska A, Hardison SE, Wang Q, Gutierrez MG, Brown GD, Papayannopoulos V (2014) Neutrophils sense microbe size and selectively release neutrophil extracellular traps in response to large pathogens Nat Immunol 15:1017-25.

      Darrah PA, Zeppa JJ, Maiello P, Hackney JA, Wadsworth MH 2nd, Hughes TK, Pokkali S, Swanson PA 2nd, Grant NL, Rodgers MA, Kamath M, Causgrove CM, Laddy DJ, Bonavia A, Casimiro D, Lin PL, Klein E, White AG, Scanga CA, Shalek AK, Roederer M, Flynn JL, Seder RA (2020) Prevention of tuberculosis in macaques after intravenous BCG immunization Nature 577:95-102.

      Forrellad MA, Blanco FC, Marrero Diaz de Villegas R, Vázquez CL, Yaneff A, García EA, Gutierrez MG, Durán R, Villarino A, Bigi F (2020) Rv2577 of Mycobacterium tuberculosis Is a virulence factor with dual phosphatase and phosphodiesterase functions Front Microbiol 11:570794.

      Gutierrez MG, Master SS, Singh SB, Taylor GA, Colombo MI, Deretic V (2004) Autophagy is a defense mechanism inhibiting BCG and Mycobacterium tuberculosis survival in infected macrophages Cell 119:753-66.

      Innokentev A, Sanchez AM, Monetti M, Schwer B, Shuman S (2025) Efn1 and Efn2 are extracellular 5'-nucleotidases induced during the fission yeast response to phosphate starvation mBio 16: e0299224.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity J Biol Chem 283:30942-9.

      Lampe GD, King RT, Halpin-Healy TS, Klompe SE, Hogan MI, Vo PLH, Tang S, Chavez A, Sternberg SH (2024) Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases Nat Biotechnol 42:87-98.

      Marasco LE, Dujardin G, Sousa-Luís R, Liu YH, Stigliano JN, Nomakuchi T, Proudfoot NJ, Krainer AR, Kornblihtt AR (2022) Counteracting chromatin effects of a splicing-correcting antisense oligonucleotide improves its therapeutic efficacy in spinal muscular atrophy Cell 185:2057-2070.e15.

      Matange N, Podobnik M, Visweswariah SS (2015) Metallophosphoesterases: structural fidelity with functional promiscuity Biochem J 467:201-16.

      Niroula N, Ghodasara P, Marreros N, Fuller B, Sanderson H, Zriba S, Walker S, Shury TK, Chen JM (2025) Orally administered live BCG and heat-inactivated Mycobacterium bovis protect bison against experimental bovine tuberculosis Sci Rep 15:3764.

      Péan CB, Schiebler M, Tan SW, Sharrock JA, Kierdorf K, Brown KP, Maserumule MC,

      Menezes S, Pilátová M, Bronda K, Guermonprez P, Stramer BM, Andres Floto R, Dionne MS (2017) Regulation of phagocyte triglyceride by a STAT-ATG2 pathway controls mycobacterial infection Nat Commun 8:14642.

      Pilli M, Arko-Mensah J, Ponpuak M, Roberts E, Master S, Mandell MA, Dupont N, Ornatowski W, Jiang S, Bradfute SB, Bruun JA, Hansen TE, Johansen T, Deretic V (2012) TBK-1 promotes autophagy-mediated antimicrobial defense by controlling autophagosome maturation Immunity 37:223-34.

      Shenoy AR, Capuder M, Draskovic P, Lamba D, Visweswariah SS, Podobnik M (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis J Mol Biol 365:211-25.

      Smith AA, Su H, Wallach J, Liu Y, Maiello P, Borish HJ, Winchell C, Simonson AW, Lin PL, Rodgers M, Fillmore D, Sakal J, Lin K, Vinette V, Schnappinger D, Ehrt S, Flynn JL (2025) A BCG kill switch strain protects against Mycobacterium tuberculosis in mice and non-human primates with improved safety and immunogenicity Nat Microbiol 10:468-481.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Weng Y, Shepherd D, Liu Y, Krishnan N, Robertson BD, Platt N, Larrouy-Maumus G, Platt FM (2022) Inhibition of the Niemann-Pick C1 protein is a conserved feature of multiple strains of pathogenic mycobacteria Nat Commun 13:5320.

      Xu X, Lu X, Dong X, Luo Y, Wang Q, Liu X, Fu J, Zhang Y, Zhu B, Ma X (2017) Effects of hMASP2 on the formation of BCG infection-induced granuloma in the lungs of BALB/c mice Sci Rep 7:2300.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      McDougal et al. aimed to characterize the antiviral activity of mammalian IFIT1 orthologs. They first performed three different evolutionary selection analyses within each major mammalian clade and identified some overlapping positive selection sites in IFIT1. They found that one site that is positively selected in primates is in the RNA-binding exit tunnel of IFIT1 and is tolerant of mutations to amino acids with similar biochemical properties. They then tested 9 diverse mammalian IFIT1 proteins against VEEV, VSV, PIV3, and SINV and found that each ortholog has distinct antiviral activities. Lastly, they compared human and chimpanzee IFIT1 and found that the determinant of their differential anti-VEEV activity may be partly attributed to their ability to bind Cap0 RNA. 

      Strengths: 

      The study is one of the first to test the antiviral activity of IFIT1 from diverse mammalian clades against VEEV, VSV, PIV3, and SINV. Cloning and expressing these 39 IFIT1 orthologs in addition to single and combinatorial mutants is not a trivial task. The positive connection between anti-VEEV activity and Cap0 RNA binding is interesting, suggesting that differences in RNA binding may explain differences in antiviral activity. 

      Weaknesses: 

      The evolutionary selection analyses yielded interesting results, but were not used to inform follow-up studies except for a positively selected site identified in primates. Since positive selection is one of the two major angles the authors proposed to investigate mammalian IFIT1 orthologs with, they should integrate the positive selection results with the rest of the paper more seamlessly, such as discussing the positive selection results and their implications, rather than just pointing out that positively selected sites were identified. The paper should elaborate on how the positive selection analyses PAML, FUBAR, and MEME complement one another to explain why the tests gave them different results. Interestingly, MEME which usually provides more sites did not identify site 193 in primates that was identified by both PAML and FUBAR. The authors should also provide the rationale for choosing to focus on the 3 sites identified in primates only. One of those sites, 193, was also found to be positively selected in bats, although the authors did not discuss or integrate that finding into the study. In Figure 1A, they also showed a dN/dS < 1 from PAML, which is confusing and would suggest negative selection instead of positive selection. Importantly, since the authors focused on the rapidly evolving site 193 in primates, they should test the IFIT1 orthologs against viruses that are known to infect primates to directly investigate the impact of the evolutionary arms race at this site on IFIT1 function. 

      We thank the reviewer for their assessment and for acknowledging the breadth of our dataset regarding diverse IFIT1s, number of viruses tested, and the functional data that may correlate biochemical properties of IFIT1 orthologous proteins with antiviral function. We have expanded the introduction and results sections to better explain and distinguish between PAML, FUBAR, and MEME analyses. Furthermore, we have expanded the discussion to incorporate the observation that site 193 is rapidly evolving in bats, as well as the observation that nearby sites to the TPR4 loop were identified as rapidly evolving in all clades of mammals tested. We also do observe an overall gene dN/dS of <1, however this is simply the average across all codons of the entire gene and does not rule out positive selection at specific sites. This is observed for other restriction factors, as many domains are undergoing purifying selection to retain core functions (e.g enzymatic function, structural integrity) while other domains (e.g. interfaces with viral antagonists or viral proteins) show strong positive selection. Specific examples include the restriction factors BST-2/Tetherin (PMID: 19461879) and MxA (PMID: 23084925). Furthermore, we agree that testing more IFIT1-sensitive viruses that naturally infect primates with our IFIT1 193 mutagenesis library would shed light on the influence of host-virus arms races at this site. However, VEEV naturally does also infect humans as well as at least one other species of primate (PMID: 39983680).

      Below we individually address the reviewers' claims of inaccurate data interpretation.

      Some of the data interpretation is not accurate. For example: 

      (1) Lines 232-234: "...western blot analysis revealed that the expression of IFIT1 orthologs was relatively uniform, except for the higher expression of orca IFIT1 and notably lower expression of pangolin IFIT1 (Figure 4B)." In fact, most of the orthologs are not expressed in a "relatively uniform" manner e.g. big brown bat vs. shrew are quite different. 

      We have now included quantification of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). We have also removed the phrase “relatively uniform” from the text and have instead included text describing the quantified expression differences.

      (2) Line 245: "...mammalian IFIT1 species-specific differences in viral suppression are largely independent of expression differences." While it is true that there is no correlation between protein expression and antiviral activity in each species, the authors cannot definitively conclude that the species-specific differences are independent of expression differences. Since the orthologs are clearly not expressed in the same amounts, it is impossible to fully assess their true antiviral activity. At the very least, the authors should acknowledge that the protein expression can affect antiviral activity. They should also consider quantifying the IFIT1 protein bands and normalizing each to GAPDH for readers to better compare protein expression and antiviral activity. The same issue is in Line 267. 

      We have now included quantification and normalization of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). Furthermore, we acknowledge in the text that expression differences may affect antiviral potency in infection experiments.

      (3) Line 263: "SINV... was modestly suppressed by pangolin, sheep, and chinchilla IFIT1 (Figure 4E)..." The term "modestly suppressed" does not seem fitting if there is 60-70% infection in cells expressing pangolin and chinchilla IFIT1. 

      We have modified the text to say “significantly suppressed” rather than “modestly suppressed.”

      (4) The study can be significantly improved if the authors can find a thread to connect each piece of data together, so the readers can form a cohesive story about mammalian IFIT1. 

      We appreciate the reviewer’s suggestion and have tried to make the story including more cohesive through commentary on positive selection and by using the computational analysis to first inform potential evolutionary consequences of IFIT1 functionality first by an intraspecies (human) approach, and then later an interspecies approach with diverse mammals that have great sequence diversity. Furthermore, we point out that almost all IFIT1s tested in the ortholog screen were also included in our computational analysis allowing for the potential to connect functional observations with those seen in the evolutionary analyses.

      Reviewer #2 (Public review): 

      McDougal et al. describe the surprising finding that IFIT1 proteins from different mammalian species inhibit the replication of different viruses, indicating that the evolution of IFIT1 across mammals has resulted in host speciesspecific antiviral specificity. Before this work, research into the antiviral activity and specificity of IFIT1 had mostly focused on the human ortholog, which was described to inhibit viruses including vesicular stomatitis virus (VSV) and Venezuelan equine encephalitis virus (VEEV) but not other viruses including Sindbis virus (SINV) and parainfluenza virus type 3 (PIV3). In the current work, the authors first perform evolutionary analyses on IFIT1 genes across a wide range of mammalian species and reveal that IFIT1 genes have evolved under positive selection in primates, bats, carnivores, and ungulates. Based on these data, they hypothesize that IFIT1 proteins from these diverse mammalian groups may show distinct antiviral specificities against a panel of viruses. By generating human cells that express IFIT1 proteins from different mammalian species, the authors show a wide range of antiviral activities of mammalian IFIT1s. Most strikingly, they find several IFIT1 proteins that have completely different antiviral specificities relative to human IFIT1, including IFIT1s that fail to inhibit VSV or VEEV, but strongly inhibit PIV3 or SINV. These results indicate that there is potential for IFIT1 to inhibit a much wider range of viruses than human IFIT1 inhibits. Electrophoretic mobility shift assays (EMSAs) suggest that some of these changes in antiviral specificity can be ascribed to changes in the direct binding of viral RNAs. Interestingly, they also find that chimpanzee IFIT1, which is >98% identical to human IFIT1, fails to inhibit any tested virus. Replacing three residues from chimpanzee IFIT1 with those from human IFIT1, one of which has evolved under positive selection in primates, restores activity to chimpanzee IFIT1. Together, these data reveal a vast diversity of IFIT1 antiviral specificity encoded by mammals, consistent with an IFIT1-virus evolutionary "arms race". 

      Overall, this is a very interesting and well-written manuscript that combines evolutionary and functional approaches to provide new insight into IFIT1 antiviral activity and species-specific antiviral immunity. The conclusion that IFIT1 genes in several mammalian lineages are evolving under positive selection is supported by the data, although there are some important analyses that need to be done to remove any confounding effects from gene recombination that has previously been described between IFIT1 and its paralog IFIT1B. The virology results, which convincingly show that IFIT1s from different species have distinct antiviral specificity, are the most surprising and exciting part of the paper. As such, this paper will be interesting for researchers studying mechanisms of innate antiviral immunity, as well as those interested in species-specific antiviral immunity. Moreover, it may prompt others to test a wide range of orthologs of antiviral factors beyond those from humans or mice, which could further the concept of host-specific innate antiviral specificity. Additional areas for improvement, which are mostly to clarify the presentation of data and conclusions, are described below. 

      Strengths: 

      (1) This paper is a very strong demonstration of the concept that orthologous innate immune proteins can evolve distinct antiviral specificities. Specifically, the authors show that IFIT1 proteins from different mammalian species are able to inhibit the replication of distinct groups of viruses, which is most clearly illustrated in Figure 4G. This is an unexpected finding, as the mechanism by which IFIT1 inhibits viral replication was assumed to be similar across orthologs. While the molecular basis for these differences remains unresolved, this is a clear indication that IFIT1 evolution functionally impacts host-specific antiviral immunity and that IFIT1 has the potential to inhibit a much wider range of viruses than previously described. 

      (2) By revealing these differences in antiviral specificity across IFIT1 orthologs, the authors highlight the importance of sampling antiviral proteins from different mammalian species to understand what functions are conserved and what functions are lineage- or species-specific. These results might therefore prompt similar investigations with other antiviral proteins, which could reveal a previously undiscovered diversity of specificities for other antiviral immunity proteins. 

      (3) The authors also surprisingly reveal that chimpanzee IFIT1 shows no antiviral activity against any tested virus despite only differing from human IFIT1 by eight amino acids. By mapping this loss of function to three residues on one helix of the protein, the authors shed new light on a region of the protein with no previously known function. 

      (4) Combined with evolutionary analyses that indicate that IFIT1 genes are evolving under positive selection in several mammalian groups, these functional data indicate that IFIT1 is engaged in an evolutionary "arms race" with viruses, which results in distinct antiviral specificities of IFIT1 proteins from different species. 

      Weaknesses: 

      (1) The evolutionary analyses the authors perform appear to indicate that IFIT1 genes in several mammalian groups have evolved under positive selection. However, IFIT1 has previously been shown to have undergone recurrent instances of recombination with the paralogous IFIT1B, which can confound positive selection analyses such as the ones the authors perform. The authors should analyze their alignments for evidence of recombination using a tool such as GARD (in the same HyPhy package along with MEME and FUBAR). Detection of recombination in these alignments would invalidate their positive selection inferences, in which case the authors need to either analyze individual non-recombining domains or limit the number of species to those that are not undergoing recombination. While it is likely that these analyses will still reveal a signature of positive selection, this step is necessary to ensure that the signatures of selection and sites of positive selection are accurate. 

      (2) The choice of IFIT1 homologs chosen for study needs to be described in more detail. Many mammalian species encode IFIT1 and IFIT1B proteins, which have been shown to have different antiviral specificity, and the evolutionary relationship between IFIT1 and IFIT1B paralogs is complicated by recombination. As such, the assertion that the proteins studied in this manuscript are IFIT1 orthologs requires additional support than the percent identity plot shown in Figure 3B. 

      (3) Some of the results and discussion text could be more focused on the model of evolution-driven changes in IFIT1 specificity. In particular, the chimpanzee data are interesting, but it would appear that this protein has lost all antiviral function, rather than changing its antiviral specificity like some other examples in this paper. As such, the connection between the functional mapping of individual residues with the positive selection analysis is somewhat confusing. It would be more clear to discuss this as a natural loss of function of this IFIT1, which has occurred elsewhere repeatedly across the mammalian tree. 

      (4) In other places in the manuscript, the strength of the differences in antiviral specificity could be highlighted to a greater degree. Specifically, the text describes a number of interesting examples of differences in inhibition of VSV versus VEEV from Figure 3C and 3D, but it is difficult for a reader to assess this as most of the dots are unlabeled and the primary data are not uploaded. A few potential suggestions would be to have a table of each ortholog with % infection by VSV and % infection by VEEV. Another possibility would be to plot these data as an XY scatter plot. This would highlight any species that deviate from the expected linear relationship between the inhibition of these two viruses, which would provide a larger panel of interesting IFIT1 antiviral specificities than the smaller number of species shown in Figure 4. 

      We thank the reviewer for their fair assessment of our manuscript. As the reviewer requested, we performed GARD analysis on our alignments used for PAML, FUBAR, and MEME (New Supp Fig 1). By GARD, we found 1 or 2 predicted breakpoints in each clade. However, much of the sequence was after or between the predicted breakpoints. Therefore, we were able to reanalyze for sites undergoing positive selection in the large region of the sequence that do not span the breakpoints. We were able to validate almost all sites originally identified as undergoing positive selection still exhibit signatures of positive selection taking these breakpoints into account: primates (11/12), bats (14/16), ungulates (30/37), and carnivores (2/4). To further validate our positive selection analysis, we used Recombination Detection Program 4 (RDP4) to remove inferred recombinant sequences from the primate IFIT1 alignment and performed PAML, FUBAR, and MEME. Once again, the sites in our original anlaysis were largely validated by this method. Importantly, sites 170, 193, and 366 in primates, which are discussed in our manuscript, were found to be undergoing positive selection in 2 of the 3 analyses using alignments after the indicated breakpoint in GARD and after removal of recombinant sequences by RDP4. We have updated the text to acknowledge IFIT1/IFIT1B recombination more clearly and include the GARD analysis as well as PAML, FUBAR, and MEME reanalysis taking into account predicted breakpoints by GARD and RDP4. Furthermore, to increase evidence that the sequences used in this study for both computational and functional analysis are IFIT1 orthologs rather than IFIT1B, we have included a maximum likelihood tree after aligning coding sequences on the C-terminal end (corresponding to bases 907-1437 of IFIT1). In Daughtery et al. 2016 (PMID: 27240734) this strategy was used to distinguish between IFIT1 and IFITB. All sequences used in our study grouped with IFIT1 sequences (including many confirmed IFIT1 sequences used in Daughterty et al.) rather than IFIT1B sequences or IFIT3. This new data, including the GARD, RDP4, and maximum likelihood tree is included as a new Supplementary Figure 1.

      We also agree with the reviewer that it is possible that chimpanzee IFIT1 has lost antiviral function due to the residues 364 and 366 that differ from human IFIT1. We have updated the discussion sections to include the possibility that chimpanzee IFIT1 is an example of a natural loss of function that has occurred in other species over evolution as well as the potential consequences of this occurrence. Regarding highlighting the strength of differences in antiviral activity between IFIT1 orthologs, we have included several updates to strengthen the ability of the reader to assess these differences. First, we have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory) giving the reader a visual representation in the figure of top antiviral orthologs during our screen. We have also updated the figure legend to inform the reader of this information.

      Reviewer #3 (Public Review):  

      Summary: 

      This manuscript by McDougal et al, demonstrates species-specific activities of diverse IFIT1 orthologs and seeks to utilize evolutionary analysis to identify key amino acids under positive selection that contribute to the antiviral activity of this host factor. While the authors identify amino acid residues as important for the antiviral activity of some orthologs and propose a possible mechanism by which these residues may function, the significance or applicability of these findings to other orthologs is unclear. However, the subject matter is of interest to the field, and these findings could be significantly strengthened with additional data.

      Strengths:

      Assessment of multiple IFIT1 orthologs shows the wide variety of antiviral activity of IFIT1, and identification of residues outside of the known RNA binding pocket in the protein suggests additional novel mechanisms that may regulate IFIT1 activity.

      Weaknesses:

      Consideration of alternative hypotheses that might explain the variable and seemingly inconsistent antiviral activity of IFIT1 orthologs was not really considered. For example, studies show that IFIT1 activity may be regulated by interaction with other IFIT proteins but was not assessed in this study.

      Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs, or whether these are unique to human and chimpanzee IFIT1. Similarly, while the hypothesis that these residues impact IFIT1 activity in an allosteric manner is an attractive one, there is no data to support this.  

      We thank the reviewer for their fair assessment of our manuscript. To address the weaknesses that the reviewer has pointed out we have expanded the discussion to more directly address alternate hypotheses, such as the possibility of IFIT1 activity being regulated by interaction with other IFIT proteins. Furthermore, we expanded the discussion to include an alternate hypothesis for the role of residues 364 and 366 in primate IFIT1 besides allosteric regulation. In addition, we did not intend to claim or imply that residues 364/6 are the key drivers of antiviral activity for all IFITs tested. However, we speculate that within primates these residues may play a key role as these residues differ between chimpanzee IFIT1 (which lacks significant antiviral activity towards the viruses tested in this study) and human IFIT1 (which possesses significant antiviral activity). In addition, these residues seem to be generally conserved in primate species, apart from chimpanzee IFIT1. We have included changes to the text to more clearly indicate that we highlight the importance of these residues specifically for primate IFIT1, but not necessarily for all IFIT1 proteins in all clades.

      Reviewer #1 (Recommendations for the authors): 

      (1) The readers would benefit from a more detailed background on the concept and estimation of positive selection for the readers, including the M7/8 models in PAML. 

      We have included more information in the text to provide a better background for the concepts of positive selection and how PAML tests for this using M7 and M8 models.

      (2) Presentation of data 

      a) Figure 3C and 3D: is there a better way to present the infection data so the readers can tell the ranked antiviral activity of the species that suppress VEEV? 

      We have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory). We have updated the figure legend to inform the reader of this information as well.

      b) Figure 4C and 4D: consider putting the western blot in Supplementary Figure 1 underneath the infection data or with the heatmap so readers can compare it with the antiviral activity. 

      We have also included quantification of the western blots performed to evaluate IFIT1 expression during the experiments shown in Figure 4C and 4D in an updated Figure 4B. We have also included normalized expression values with the heatmap shown in an updated Figure 4G so the reader can evaluate potential impact of protein expression on antiviral activity for all infection experiments shown in figure 4.

      (3) Line 269-270: as a rationale for narrowing the species to human, black flying fox, and chimp IFIT1, human and black flying fox were chosen because they strongly inhibit VEEV, but pangolin wasn't included even though it had the strongest anti-VEEV activity? 

      The rationale for narrowing the species to human, black flying fox, and chimpanzee IFIT1 was related to the availability of biological tools, high quality genome/transcriptome sequencing databases, and other factors. Specifically human and chimp IFIT1 are closely related but have variable antiviral activities, making their comparison highly relevant. Bats are well established as reservoirs for diverse viruses, whereas the reservoir status of many other mammals is less well defined. Furthermore, purifying large amounts of high quality IFIT1 protein after bacterial expression was another limitation to functional studies. We have added this information into the manuscript text.

      (4) Figure 5A: to strengthen the claim that "species-specific antiviral activities of IFIT1s can be partly explained by RNA binding potential", it would be good to include one more positive and one more negative control. In other words, test the cap0 RNA binding activity of an IFIT1 ortholog that strongly inhibits VEEV and an ortholog that does not. It would also be good to discuss why chimp IFIT1 still shows dose-dependent RNA binding yet it is one of the weakest at inhibiting VEEV. 

      We appreciate the reviewer's suggestion to include more controls and expand the dataset. While we understand the potential value of expanding the dataset, we believe that human IFIT1 serves as a robust positive control and human IFIT1 R187 (RNA-binding deficient) serves as an established negative control. Future experiments with other purified IFITs from other species will indeed strengthen evidence linking IFIT1 species-specific activity and RNA-binding.

      Regarding chimpanzee IFIT1, we acknowledge there appears to be some dose-dependent Cap0 RNA-binding. However, the binding affinity is much weaker than that of human or black flying fox IFIT1. We speculate that during viral infection reduced binding affinity could impair the ability of chimpanzee IFIT1 to efficiently sequester viral RNA and inhibit viral translation. This reduction in binding affinity may, therefore, allow the cell to be overwhelmed by the exponential increase in viral RNA during replication resulting in an ineffective antiviral IFIT1. In the literature, a similar phenomenon is observed by Hyde et. al (PMID: 24482115). In this study, the authors test mouse Ifit1 Cap0 RNA binding by EMSA of the 5’ UTR sequence of VEEV RNA containing an A or G at nucleotide position 3. EMSA shows binding of both the A3 and G3 Cap0 VEEV RNA sequences, however stronger Ifit1 binding is observed for A3 Cap0 RNA sequence. The consequences of the reduced Ifit1 binding of the G3 Cap0 VEEV RNA are observed in vitro by a substantial increase in viral titers produced from cells as well as an increase in protein produced in a luciferase-based translation assay. The authors also show in vivo relevance of this reduction of Ifit1 binding as WT B6 mice infected with VEEV containing the A3 UTR exhibited 100% survival, while WT B6 mice infected with VEEV containing the G3 UTR survived at a rate of only ~25%. Therefore, the literature supports that a decrease in Cap0 RNA binding by an IFIT protein (while still exhibiting Cap0 RNA binding) observed by EMSA can result in considerable alterations of viral infection both in vitro and in vivo.

      Minor: 

      (1) Line 82: "including 5' triphosphate (5'-ppp-RNA), or viral RNAs..." having a comma here will make the sentence clearer. 

      We have improved the clarity of this sentence. It now reads, “IFIT1 binds uncapped 5′triphosphate RNA (5′-ppp-RNA) and capped but unmethylated RNA (Cap0, an m<sup>7</sup>G cap lacking 2′-O methylation).”

      (2) Line 100: "...similar mechanisms have been at least partially evolutionarily conserved in IFIT proteins to restrict viral infection by IFIT proteins". 

      We have updated the text to improve clarity by revising the sentence to “VEEV TC-83 is sensitive to human IFIT1 and mouse Ifit1B, indicating at least partial conservation of antiviral function by IFIT proteins."

      (3) Line 109: "signatures of rapid evolution or positive selection" would put positive selection second because that is the more technical term that can benefit from the more layperson term (rapid evolution). 

      We have updated this sentence incorporating this suggestion. “Positive selection, or rapid evolution, is denoted by a high ratio of nonsynonymous to synonymous substitutions (dN/dS >1).”

      (4) Lines 116-117: "However, this was only assessed in a few species" would benefit from a citation. 

      We have inserted the citation.

      (5) Line 127 heading: "IFIT1 is rapidly evolving in mammals" would be more accurate to say "in major clades of mammals". 

      We have updated the text to include this suggestion.

      (6) Line 165: "IFIT1 L193 mutants". 

      We have updated the text to rephrase this for clarity.

      (7) Line 170: two strains of VEEV were mentioned in the Intro, so it would be good to specify which strain of VEEV was used?

      We have updated the text to clarify the VEEV strain. In this study, all experiments were performed using the VEEV TC-83 strain.

      (8) Line 174: "Indeed, all mutants at position 193, whether hydrophobic or positively charged, inhibited VEEV similarly to the WT..." It should read "all hydrophobic and positively charged mutants inhibited VEEV similarly to the WT...". 

      We corrected as suggested. 

      (9) Line 204: what are "control cells"? Cells that are mock-infected, or cells without IFIT1? 

      We have updated the text to improve clarity. What we refer to as control cells, were cells expressing an empty vector control rather than an IFIT1.

      (10) Need to clarify n=2 and n=3 replicates throughout the manuscript. Does that refer to three independent experiments? Or an experiment with triplicate wells/samples? 

      We have updated the text to say “independent experiments” instead of “biological replicates” to prevent any confusion.  All n=2 or n=3 replicates denote independent experiments.

      (11) Line 254: "dominant antiviral effector against the related human parainfluenza virus type 5..." 

      We have updated the text to improve clarity.

      (12) Line 271: "The black flying fox (Pteropus alecto), is a model megabat species..." scientific name was italicized here but not elsewhere. Remove comma.

      We have updated the text accordingly.

      (13) Line 293: "...chimpanzee IFIT1 lacked these properties" but chimp IFIT1 can bind cap0 RNA, just at a lower level. 

      We have updated the text to acknowledge that chimpanzee IFIT1 can bind cap0 RNA, albeit at a lower level than human IFIT1.

      (14) Figure 6B: please fix the x-axis labels. They're very cramped. 

      We have updated the x-axis labels for figure 6B and figure 6D to improve clarity.

      (15) Line 609: "...trimmed and aligned"? 

      Our phrasing is to indicate that coding sequences were aligned, and gaps were removed to reduce the chance of false positive signal by underrepresented codons such as gaps or short insertions. We have removed “trimmed” from the text and changed the text to say “aligned sequences” to increase clarity.

      Reviewer #2 (Recommendations for the authors): 

      (1) Numbers less than 10 should be spelled out throughout the manuscript (e.g. line 138). 

      We have updated the text to reflect the request.

      (2) Line 165: "expression of IFIT1 193 mutants" should be rephrased. 

      We have updated the text to rephrase this sentence for clarity.

      (3) A supplemental table or file should be included that contains the accession number and species names of sequences used for evolutionary analyses and for functional testing. In addition, the alignments that were used for positive selection can be included.  

      We have included a supplemental file containing accession numbers, species names for evolutionary analysis and functional studies. In addition, this table includes the infection data for each IFIT1 homolog for the screen performed in figure 3.

      (4) The discussion of potential functions of the C-terminus of IFIT1 should include possible interactions with other proteins. In particular, the C-terminus of IFIT1 has been shown to interact with IFIT3 in a way that modulates its activity (PMID: 29525521). Although residues 362-366 were not shown in that paper to interact with a fragment of IFIT3, it is possible that these residues may be important for interaction with full-length IFIT3 or some other IFIT1 binding partner. 

      We thank the reviewer for their suggestion. We have expanded the discussion to explore the possibility that residues 364 and 366 of IFIT1 may be involved in IFIT1-IFIT3 interactions and consequently Cap0 RNA-binding and antiviral activity.

      (5) The quantification of the EMSAs should be described in more detail. In particular, from looking at the images shown in Figure 5A, it would appear that human and chimpanzee IFIT1 show similar degrees of probe shift, while the human R187H panel shows no shifting at all. However, the quantification shows chimpanzee IFIT1 as being statistically indistinguishable from human R187H. Additional information on how bands were quantified and whether they were normalized to unshifted RNA would be helpful in attempting to resolve this visual discordance. 

      EMSAs were quantified by determining Adj. Vol. Intensity in ImageLab (BioRad), which subtracts background signal, after imaging at the same exposure and SYBR Gold staining time. To determine Adj. Vol. Intensity, we drew a box (same size for each gel and lane for each replicate) for each lane above the free probe. These values were not normalized to unshifted RNA, however equal RNA was loaded. While the ANOVA shows no significant difference, between human R187H and chimpanzee IFIT1 band shift intensity, this is potentially due to the between group variance in the ANOVA. The increase in the AUC value for chimpanzee IFIT1 is 36.4% higher than R187H.

      The AUC of Adj. Vol. Intensity of human IFIT1 band shift is roughly 2-fold more than that of chimpanzee IFIT1. We believe this matches with the visual representation as well, as human IFIT1 has a darker “upper” band in the shift, as well as a clear dark “lower” band that is not well defined in the chimpanzee shift. Furthermore, the upper band of the chimpanzee IFIT1 shift appears to be as intense in the 400nM as the upper band in the 240nM human IFIT1 lane, without taking into account the lower band seen for human IFIT1 as well. We included this quantification as kD was unable to be calculated due to no clear probe disappearance and we do not intend for this quantification to act as a substitute for binding affinity calculations, rather to aid the reader in data interpretation.

      Reviewer #3 (Recommendations for the authors): 

      (1) IFIT1 has been demonstrated to function in conjunction with other IFIT proteins, do you think the absence of antiviral activity is due to isolated expression of IFIT1 without these cofactors, and therefore might explain why there was little overlap observed in orthologs that inhibited the viruses tested (Figure 3, lines 209-210). 

      We do not believe that isolated expression of IFIT1 without cofactors (such as orthologous IFIT proteins) would fully explain the disparities in antiviral activity as many IFIT1s that expressed inhibited either VSV or VEEV in our screen. However, we acknowledge that the expression of IFIT1 alone does create a limitation in our study as IFIT1 antiviral activity and RNA-binding can be modulated by interactions with other IFIT proteins. Therefore, we do believe that it is possible that co-expression of IFIT1 with other IFITs from a given species might potentially enhance antiviral activity. Future studies may shed light on this.

      (2) Figure 5 - Calculating the Kd for each protein would be more informative. How does the binding affinity of these IFIT1 proteins compare to that which has previously been reported? 

      We are unable to accurately determine kD as there is not substantial diminished signal of the free probe. Therefore, we are only able to compare IFIT1 protein binding between species without accurate mathematical calculation of binding affinity. Our result does appear similar to that of mouse Ifit1 binding to VEEV RNA (PMID: 24482115), in which the authors also do not calculate a kD for their RNA EMSA.

      (3) Mutants 364 and 366 may not have direct contact with RNA, but RNA EMSA data presented suggest that the binding affinity may be different (though this is hard to conclude without Kd data). Additional biochemical data with these mutants might provide more insight here. 

      We agree that further studies using 364 and 366 double mutant human and chimpanzee protein in EMSAs would provide additional biochemical data and provide insight into the role of these residues in direct RNA binding. We acknowledge this is a limitation of our study as we provide only genetic data demonstrating the importance of these residues.

      (4) Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs. A more systematic assessment of the role of these mutations across multiple diverse orthologs would provide more insight here. Do other antiviral proteins show this trend (ie exhibit little overlap in orthologs that inhibit these viruses). What do you think might be driving this? 

      We agree that other residues outside of 364 and 366 may be key drivers of antiviral activity across the IFTI1 orthologs tested. We do not hypothesize that this will broadly apply across IFIT1 from diverse clades of mammals as overall amino acid identity can differ by over 30%. However, based on the chimpanzee and human IFIT1 data, as well as sequence alignment within primates specifically, we believe these residues may be key for primate (but not necessarily other clades of mammals) IFIT1 antiviral activity.

      Regarding if other antiviral proteins show little overlap in orthologs that inhibit a given virus, to our knowledge such a functional study with this large and divergent dataset of orthologs has not been performed. However, there are many examples of restriction factors exhibiting speciesspecific antiviral activity when ortholog screens have been performed. For example, HIV was reported to be suppressed by MX2 orthologs from human, rhesus macaque, and African green monkey, but not sheep or dog MX2 (PMID: 24760893). In addition, foamy virus was inhibited by the human and rhesus macaque orthologs of PHF11, but not the mouse and feline orthologs (PMID: 32678836). Furthermore, studies from our lab have shown variability in RTP4 ortholog antiviral activity inhibition towards viruses much as hepatitis C virus (HCV), West Nile virus (WNV), and Zika virus (ZIKV) (PMID: 33113352).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brownian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers. 

      Strengths: 

      This manuscript presents a novel method for trajectory analysis. 

      Weaknesses: 

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model? 

      We chose to test the range of track lengths (five-to-hundreds of steps) to cover the broad range of scenarios arising from single proteins or fluorophores to brighter objects with more labels.  While there is no upper-limit per se, the computation time of our method scales linearly with track length, 100 time-points takes ~2 minutes to run on a standard consumer-level desktop CPU. We have added the following sentence to note the time-cost with trajectory length:  

      “The recurrent formula enables our model computation time to scale linearly with the number of time points.”

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data. 

      We agree that limitations are important to estimate, and trajectory length is an important consideration when choosing how to analyze a dataset. We report the categorization certainty, i.e. the likelihood differences, for a range of track lengths (Fig. 2 a,c, Fig. 3c-d, and Fig. 4 c,g.).

      For example, here are the key plots from Fig. 2 quantifying the relative likelihoods, where being within the light region is necessary. The light areas represent a useful likelihood ratio.

      We only performed analysis up to track lengths of 600 time steps but parameter estimations and significance can only improve when increasing the track length as long as the model assumptions are verified. The broader limitations and future opportunities for new methods are now expanded upon in the discussion, for example switching between states and model and state and model ambiguities (bound vs very slow diffusion vs very slow motion).

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      We apologize for the confusion. All the model parameters are fit using the maximum likelihood approach. To make this point clearer in the manuscript, we have made three changes:

      (1) We modified the following sentence to replace “determined” with "fit”:

      “Finally, Maximum Likelihood Estimation (MLE) is used to fit the underlying parameter value”

      (2) We added the following sentence in the main text :

      “In our model, the velocity is the characteristic parameter of directed motion and the confinement factor represents the force within a potential well. More precisely, the confinement factor $l$ is defined such that at each time step the particle position is updated by $l$ times the distance particle/potential well center (see the Methods section for more details).”.

      (3) We have added a new section in the methods, called Fitting Method, where we have added the explanation below:

      “For the pure Brownian model, the parameters are the diffusion coefficient and the localization error. For the confinement model, the parameters are the diffusion coefficient, the localization error, confinement factor, and the diffusion coefficientof the potential well. For the directed model, the parameters are the diffusion coefficient, the localization error, the initial velocity and the acceleration variance.

      These parameters are estimated using the maximum likelihood approach which consists in finding the parameters that maximize the likelihood. We realize this fitting step using gradient descent via a TensorFlow model. All the estimates presented in this article are obtained from a single set of initial parameters to demonstrate that the convergence capacity of aTrack is robust to the initial parameter values.”

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time. 

      In the text, we present an example where one can observe this type of motion to help the reader understand when this type of motion can be met: “Sometimes, particles undergo diffusion and directed motion simultaneously, for example, particles diffusing in a flowing medium (Qian 1991).”

      This is simulated by the addition of two terms affecting the hidden position variable before adding a localization term to create the observed variable. In the analysis, this manifests as non-zero values for the diffusion coefficient and the linear velocity. For example, Figure 4g and the associated text, where a single particle moves with a directed component and a Brownian diffusion component at each step.

      We did not simulate transitions between types of motion. Switching is not treated by this current model; however, this limitation is described in the discussion and our team and others are currently working on addressing this challenge.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data. 

      Strengths: 

      A potential advantage of the presented method is its wide applicability to different motion types. 

      Weaknesses: 

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).  

      We thank the reviewer for this suggestion and agree that the explanation of the innovative aspects of our method in the introduction was not clear enough. We have now modified the introduction to better explain what is improved here compared to previous approaches.

      “The main innovations of this model are: 1) it uses analytical recurrence formulas to perform the integration step for complex motion, improving speed and accuracy; 2) it handles both confined and directed motion; 3) anomalous parameters, such as the center of the potential well and the velocity vector are allowed to change through time to better represent tracks with changing directed motion or confinement area; and lastly 4) for a given track or set of tracks, aTrack can determine whether tracks can be statistically categorized as confined or directed, and the parameters that best describe their behavior, for example, diffusion coefficient, radius of confinement, and speed of directed motion.”

      Regarding alternatives, we compare our method in the text to the best-performing algorithm of the

      2021 Anomalous Diffusion (AnDi) Challenge challenge mentioned by the reviewer in Figure 6 (RANDI, Argun et al, arXiv, 2021, Muñoz-Gil et al, Nat Com. 2021). Notably, both methods performed similarly on fBm, but ours was more robust in cases where there were small differences between the process underlying the data and the model assumptions, a likely scenario in real datasets. Regarding Spot-On, this was not mentioned as it only deals with multiple populations of Brownian diffusers, preventing a quantitative comparison.

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc).

      We now explain our statistical approach and how to perform hypothesis testing with our metric in a new supplementary section, Statistical test. 

      We use the likelihood ratio as a more conservative alternative to the p-value. In Fig S2, we show that our metric is an upper bound of the p-value and can be used to perform hypothesis testing with a chosen type I error rate. 

      Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests). 

      We present the likelihood ratio as an upper bound of the p-value. Therefore, we can reject the null hypothesis if it is smaller than a given threshold, e.g. 0.05, but this number should be decreased if multiple tests are performed. The colorscale we show in the figure is meant to highlight the working range (light), and ambiguous range (dark) of the method.

      As the reviewer mentions, we expect the alternative hypothesis to result in higher likelihoods than the simpler null hypothesis for null hypothesis tracks, but, as seen in the Fig S2, the likelihood ratio of a dataset corresponding to the null hypothesis is strongly skewed toward its upper limit 1. This means that for most of the tracks, the likelihood is not (or little) affected by the model complexity. The likelihoods of all the models are normalized so their integrals over the data equals 1/A with A the area of the field of view which is independent of the model complexity.

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here? 

      The reviewer raises an important point as our model does not explicitly consider motion blur. We have now added a paragraph that presents how our model performs in case of motion blur in the section called Robustness to model mismatches. This section and the corresponding new Supplemental Fig. S7 demonstrate that the estimated diffusion length is accurate so long as the static localization error is higher than the dynamic localization error. If the dynamic localization error is higher, our model systematically underestimates the diffusion length by a factor 0.81 = (2/3)<sup>0.5</sup> which can be corrected for with an added post-processing step.  

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.  

      We thank the reviewer for their feedback on improving the readability. To avoid overly repetitive and lengthy sections of text, we have opted for a concise approach. This allows us to present closely related panels at the same point in the text, while not ignoring important variations and tests. Considering this feedback and the reviewers, we have added more information and interpretation throughout our manuscript to improve interpretability.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished. 

      We have modified the specific text related to this figure to describe an illustrative example to show how one could use aTrack on a dataset where not that much is known: First, we present the method to determine the number of states; second, we verify the parameter estimates correspond to the different states.  

      Classifying individual tracks is possible. While not done in the section corresponding to Fig. 5, this is done in Fig. 7 and a new supplementary plot, Fig. S9b (shown below). In brief, this is accomplished with our method by computing the likelihood of each track given each state. The probability that a given track is in state k equals the likelihood of the track given the state divided by the sum of the likelihoods given the different states. 

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e". 

      Thank you for bringing these errors to our attention, the panel and caption have been corrected.

      Reviewer #3 (Public Review): 

      Summary: 

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, wellcharacterised, and tested, and will be of general interest to the single-particle-tracking field. 

      Strengths: 

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive. 

      We apologize for the confusion, the directed tracks in Fig 2 have no Brownian-motion component, i.e. D=0. We have made this clearer in the main text. Specifically, this section of the text refers to a track in linear motion with 2 nm displacements per step. With 70 time points (69 steps), a single particle which moved from 138 nm with a localization error of 20 nm (95% uncertainty range of 80 nm) can be statistically distinguished from slow diffusive motion.

      In Fig. 4g, we explore the capabilities of our method to detect if a diffusive particle also has a directed motion component. 

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined. 

      (3) The software is well-documented, easy to install, and easy to use. 

      Weaknesses: 

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical stateswitching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.  

      We agree that handling dynamic state switching is very useful and highlight this potential future direction in the discussion. The revised text reads:

      “An important limitation of our approach is that it presumes that a given track follows a unique underlying model with fixed parameters. In biological systems, particles often transition from one motion type to another; for example, a diffusive particle can bind to a static substrate or molecular motor (46). In such cases, or in cases of significant mislinkings, our model is not suitable. However, this limitation can be alleviated by implicitly allowing state transitions with a hidden Markov Model (15) or alternatives such as change-point approaches (30, 47, 48), and spatial approaches (49).”

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack. 

      We agree with the reviewer and have revised the biological experiment section significantly to better illustrate the potential of aTrack in various use cases.

      Now, we show an experiment to quantify the effect of LatA, an actin inhibitor, on the fraction of directed tracks obtained with aTrack. We find that LatA significantly decreases directed motion while a LatA-resistant mutant is not affected (Fig7a-c).

      As suggested by the reviewer, we have expanded the optical tweezer experiment by varying the laser power. As expected, increasing the laser power decreases the confinement radius.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

      We thank the reviewer for this recommendation; we have now modified the architecture of our model to enable users to consider tracks of multiple lengths. Note that the computation time is proportional to the longest track length times the number of tracks.  

      Reviewer #2 (Recommendations For The Authors): 

      Develop a better mathematical foundation for the likelihood ratio tests. 

      We added more explanation of the likelihood ratio tests and their interpretation a new section entitled Statistical test in the supplementary information to address this recommendation.

      Place this work in clearer contexts. 

      We have now revised the introduction to better contextualize this work.

      Improve manuscript clarity. 

      Based on reviewer feedback and input from others, we have addressed this point throughout the article to improve readability.

      Make the code available. 

      The code is available on https://github.com/FrancoisSimon/aTrack, now including code for track generation.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I believe the underlying model presented in Figure 1 is of substantial impact, especially when considering it as a simulation tool. I would suggest the authors make their method also available as a simulator (as far as I can tell, this is not explicitly done in their code repository, although logically the code required for the simulator should already be in the codebase somewhere). 

      Thank you for this suggestion, the simulation scripts are now on the Github repository together with the rest of the analysis method. https://github.com/FrancoisSimon/aTrack

      (2) The authors should explore and/or discuss the effects of wrong trajectory linking to their method. Throughout the text, fully correct trajectory linking is assumed and assessed, while in real experiments, it is often the case that trajectory linking is wrong, e.g. due to blinking emitters, imaging artefacts, high-density localizations, etc etc. This would have a major impact on the accuracy of trajectories, and it is extremely relevant to explore how this is translated to the output of aTrack. 

      As the reviewer notes, our current model does not account for track mislinking. This limits the method to data with lower fluorophore-densities, which is the typical use-case for SPT. We have added a brief description of the issue into the discussion of limitations.  

      (3) aTrack only supports 2D-tracking, but I don't believe there is a conceptual reason not to have this expanded to three dimensions. 

      The stand-alone software is currently limited to 2D tracks, however, the aTrack Python package works for any number of dimensions (i.e. 1-3). Note that since the current implementation assumes a single localization error for all axes, more modifications may be required for some types of 3D tracking. See https://github.com/FrancoisSimon/aTrack for more details about aTrack implementations.

      (4) Crucial information is missing in the experimental demonstrations. Especially in the NP-bacteria dataset, I miss scalebars, and information on the number of tracks. It is not explained why 5 different states are obtained - especially because I would naively expect three states: immobile NPs (e.g. stuck to glass), diffusing NPs, and NPs attached to bacteria, and thus directed. Figure 7e shows three diffusive states (why more than one?), no immobile states (why?), and two directed states (why?). 

      We thank the reviewer for pointing out these issues. We have now added scalebars and more experimental details to the figure and text as well as modifying the plot to more clearly emphasize the directed nanoparticles that are attached to cells from the diffusive nanoparticles.  

      Likely, our focal plane was too high to see the particles stuck on glass. The multiple diffusive states may be caused by different sizes of nanoparticle complexes, the multiple directed states can be caused by the fact that directed motion of the cell-attached-nanoparticles occasionally shows drastic changes of orientations. We have also clarified in the text how multiple states can help handle a heterogeneous population as was shown by Prindle et al. 2022, Microbiol Spectr. The characterization and phenotyping of microbial populations by nanoparticle tracking was published in Zapata et al. 2022, Nanoscale. 

      (5) I don't think I agree that 'robustness to model mismatches' is a good thing. Very crudely, the fact that aTrack finds fractional Brownian motion to be normal Brownian motion is technically a downside - and this should be especially carefully positioned if (in the future) a fractional Brownian motion model would be added to aTrack. I think that the author's point can be better tested by e.g. widely varying simulated vs fitted loc precision/diffusion coefficient (which are somewhat interchangeable).

      In this context, our intention in describing the robustness to “model mismatches” refers to classifying subdiffusion as subdiffusive irrespective of the exact subdiffusion motion physics (as well as superdiffusion), that is, to use aTrack how MSD analysis is often deployed. This is important in the context of real-world applications where simple mathematical models cannot perfectly represent real tracks with greater complexity. 

      Inevitably, some fraction of tracks with a pure Brownian motion may appear to match with a fractional Brownian motion, and thus statistical tests are needed to determine if this is significant. In general, aTrack finds fBm to be normal Brownian motion only when the anomalous coefficient is near 1, i.e. when the two models are indeed the same. When analysing fBm tracks with anomalous coefficients of 0.5 or 1.5, aTrack find that these tracks are better explained by our confined diffusion model or directed motion model, respectively (Please see Fig. 6a, copied below). 

      To better clarify our objective, the section now has a brief introduction that reads:

      “One of the most important features of a method is its robustness to deviations from its assumptions. Indeed, experimental tracking data will inevitably not match the model assumptions to some degree, and models need to be resilient to these small deviations.”  

      Smaller points: 

      (1) It is not clear what a biological example is of rotational diffusion. 

      We modified the text to better explain the use of rotational diffusion.

      (2) The text in the section on experimental data should be expanded and clarified, there currently are multiple 'floating sentences' that stop halfway, and it does not clearly describe the biological relevance and observed findings.  

      We thank the reviewer for pointing out this issue. We have reworked the experimental section to better and more clearly explain the biological relevance of the findings.

      (3) Caption of figure 3: 'd' should be 'e'. 

      (4) Caption of Figure 7: log-likelihood should be Lconfined - Lbrownian, I believe. 

      (5) Equation number missing in SI first sentence. 

      (6) Supplementary Figure 1 top part access should be Lc-Lb instead of Ld-Lb. 

      We have made these corrections, thank you for bringing them to our attention.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the present work, Chen et al. investigate the role of short heat shock factors (S-HSF), generated through alternative splicing, in the regulation of the heat shock response (HSR). The authors focus on S-HsfA2, an HSFA2 splice variant containing a truncated DNA-binding domain (tDBD) and a known transcriptional-repressor leucin-rich domain (LRD). The authors found a two-fold effect of S-HsfA2 on gene expression. On the one hand, the specific binding of S-HsfA2 to the heat-regulated element (HRE), a novel type of heat shock element (HSE), represses gene expression. This mechanism was also shown for other S-HSFs, including HsfA4c and HsfB1. On the other hand, S-HsfA2 is shown to interact with the canonical HsfA2, as well as with a handful of other HSFs, and this interaction prevents HsfA2 from activating gene expression. The authors also identified potential S-HsfA2 targets and selected one, HSP17.6B, to investigate the role of the truncated HSF in the HSR. They conclude that S-HsfA2-mediated transcriptional repression of HSP17.6B helps avoid hyperactivation of the HSR by counteracting the action of the canonical HsfA2.

      The manuscript is well written and the reported findings are, overall, solid. The described results are likely to open new avenues in the plant stress research field, as several new molecular players are identified. Chen et al. use a combination of appropriate approaches to address the scientific questions posed. However, in some cases, the data are inadequately presented or insufficient to fully support the claims made. As such, the manuscript would highly benefit from tackling the following issues:

      (1) While the authors report the survival phenotypes of several independent lines, thereby strengthening the conclusions drawn, they do not specify whether the presented percentages are averages of multiple replicates or if they correspond to a single repetition. The number of times the experiment was repeated should be reported. In addition, Figure 7c lacks the quantification of the hsp17.6b-1 mutant phenotype, which is the background of the knock-in lines. This is an essential control for this experiment

      For the seedling survival rates and gene expression levels, we added statistical analysis based on at least two independent experiments. Figure 6E of the revised manuscript shows the phenotypes of the WT, hsp17.6b-1, HSP17.6B-KI, and HSP17.6B-OE plants and the statistical analysis of their seedling survival rates after heat exposure.

      (2) In Figure 1c, the transcript levels of HsfA2 splice variants are not evident, as the authors only show the quantification of the truncated variant. Moreover, similar to the phenotypes discussed above, it is unclear whether the reported values are averages and, if so, what is the error associated with the measurements. This information could explain the differences observed in the rosette phenotypes of the S-HsfA2-KD lines. Similarly, the gene expression quantification presented in Figures 4 and 5, as well as the GUS protein quantification of Figure 3F, also lacks this crucial information.

      RT‒qPCR analysis of the expression of these genes from at least two independent experiments was performed. We also added these missing information to the figure legends.

      (3) The quality of the main figures is low, which in some cases prevents proper visualization of the data presented. This is particularly critical for the quantification of the phenotypes shown in Figure 1b and for the fluorescence images in Figures 4f and 5b. Also, Figure 9b lacks essential information describing the components of the performed experiments.

      We apologize; owing to the limitations of equipment and technology, we will attempt to obtain high-quality images in the future. A detailed description of Figure 9b is provided in the methods section.

      (4) Mutants with low levels of S-HsfA2 yield smaller plants than the corresponding wild type. This appears contradictory, given that the proposed role of this truncated HSF is to counteract the growth repression induced by the canonical HSF. What would be a plausible explanation for this observation? Was this phenomenon observed with any of the other tested S-HSFs?

      We found that the constitutive expression of S-HsfA2 inhibits Arabidopsis growth. Considering this, Arabidopsis plants do not produce S-HsfA2 under normal conditions to avoid growth inhibition. However, under heat stress, Arabidopsis plants generate S-HsfA2, which contributes to heat tolerance and growth balance. In the revised manuscript, we provided supporting data indicating that S-HsfA4c-GFP or S-HsfB1-RFP constitutive expression confers Arabidopsis extreme heat stress sensitivity but inhibits root growth (Supplemental Figure S8). Therefore, this phenomenon is also observed in S-HsfA4c-GFP or S-HsfB1-RFP.

      (5) In some cases, the authors make statements that are not supported by the results:<br /> (i) the claim that only the truncated variant expression is changed in the knock-down lines is not supported by Figure 1c;

      In three S-HsfA2-KD lines, RT‒PCR splicing analysis revealed that HsfA2-II but not HsfA2-III is easily detected. In the revised manuscript, we added RT‒qPCR analysis, and the results revealed that the abundance of HsfA2-III and HsfA2-II but not that of the full-length HsfA2 mRNA significantly decreased under extreme heat (Figure 1C). Considering that HsfA2-III but not HsfA2-II is a predominant splice variant under extreme heat (Liu et al., 2013), S-HsfA2-KD may lead to the knockdown of alternative HsfA2 splicing transcripts, especially HsfA2-III.

      (ii) the increase in GUS signal in Figure 3a could also result from local protein production;

      We included this possibility in the results analysis.

      (iii) in Figure 6b, the deletion of the HRE abolishes heat responsiveness, rather than merely altering the level of response; and

      In the revised manuscript, we added new data concerning the roles of HREs and HSEs in the response of the HSP17.6B promoter to heat stress (Figure 6A). These results suggest that the HRE and HSE elements are responsible for the response of the HSP17.6B promoter to heat stress and that the HRE negatively regulates the HSP17.6B promoter at 37°C, whereas the HSE is positive at 42°C.

      (iv) the phenotypes in Figure 8b are not clear enough to conclude that HSP17.6B overexpressors exhibit a dwarf but heat-tolerant phenotype.

      When grown in soil, the HSP17.6B-OE seedlings presented a dwarf phenotype compared with the WT control. Heat stress resulted in browning of the WT leaves, but the leaves of the HSP17.6B-OE plants remained green, suggesting that the HSP17.6B-OE seedlings also presented a heat-tolerant phenotype in the soil. These results are qualitative but not quantitative experimental data; therefore, the conclusions are adjusted in the abstract and results sections.

      Reviewer #2 (Public review):

      Summary:

      The authors report that Arabidopsis short HSFs S-HsfA2, S-HsfA4c, and S-HsfB1 confer extreme heat. They have truncated DNA binding domains that bind to a new heat-regulated element. Considering Short HSFA2, the authors have highlighted the molecular mechanism by which S-HSFs prevent HSR hyperactivation via negative regulation of HSP17.6B. The S-HsfA2 protein binds to the DNA binding domain of HsfA2, thus preventing its binding to HSEs, eventually attenuating HsfA2-activated HSP17.6B promoter activity. This report adds insights to our understanding of heat tolerance and plant growth.

      Strengths:

      (1) The manuscript represents ample experiments to support the claim.

      (2) The manuscript covers a robust number of experiments and provides specific figures and graphs in support of their claim.

      (3) The authors have chosen a topic to focus on stress tolerance in a changing environment.

      Weaknesses:

      (1) One s-HsfA2 represents all the other s-Hsfs; S-HsfA4c, and S-HsfB1. s-Hsfs can be functionally different. Regulation may be positive or negative. Maybe the other s-hsfs may positively regulate for height and be suppressed by the activity of other s-hsfs.

      In this study, we used S-HsfA2, S-HsfA4c, and S-HsfB1 data to support the view that “splice variants of HSFs generate new plant HSFs”. We also noted that S-HsfA2 cannot represent a traditional S-HSF. S-HsfA4c and S-HsfB1 may have functions other than S-HsfA2 because of their different C-terminal motifs or domains. Different S-HSFs might participate in the same biological process, such as heat tolerance, through the coregulation of downstream genes. We added this information to the discussion section.

      (2) Previous reports on gene regulations by hsfs can highlight the mechanism.

      In the introduction section, we included these references concerning HSFs and S-HSFs.

      (3) The Materials and Methods section could be rearranged so that it is based on the correct flow of the procedure performed by the authors.

      The materials and methods and results sections are arranged in the logical order.

      (4) Graphical representation could explain the days after sowing data, to provide information regarding plant growth.

      The days after sowing (DAS) for the age of the Arabidopsis seedlings are stated in the Materials and Methods section and figure legends.

      (5) Clear images concerning GFP and RFP data could be used.

      We provided high-quality images of S-HsfA2-GFP and the GFP control (Figure 3 in the revised manuscript).

      Reviewing Editor comments:

      The EMSA shown in Figures 2, 3, 4, and 5, which are critical to support the manuscript's claims, are of poor quality, without any repeats to support. In addition, there is not much information about how these EMSA were done. I suggest including better EMSA in a new version of this manuscript.

      Thank you for your suggestion. We added the missing information, including the detailed EMSA method and experiment repeat times in the methods section and figure legends. We provide high-quality images of HRE probes binding to nuclear proteins (Figure 4E).

      Reviewer  #1 (Recommendations for the authors):

      (1) The paper is overall well-written, but it could greatly benefit from reorganizing the results subsections. Currently, there are entire subsections dedicated to supplementary figures (e.g., lines 177-191) and main figures split into different subsections (e.g., lines 237-246). It is recommended to organize all the information related to a main figure into a single subsection and to incorporate the description of the corresponding supplementary figures. This would imply a general reorganization of the figures, moving some information to the supplementary data (for instance, the data in Figure 4 could be supplementary to Figure 5) and vice versa (Supplementary Figure 4 should be incorporated into main Figure 2, as it presents very important results). Also, Figures 7 and 8 would be better presented if merged into a single figure/subsection.

      Thank you for your suggestion. We have merged some figures into a single figure according to the main information. In the current version, there are 8 main figures, which includes a new figure.

      (2) Survival phenotypes vary widely, making reliable statistical analysis challenging. The chlorophyll and fresh weight quantifications presented in figures such as Figure 5 appear to effectively describe the phenomenon and allow for statistical comparisons. Figures 1 and 7 would benefit from including these measurements if the variability in survival percentages is too high to calculate statistical differences reliably. Also, in Figure 8, all chlorophyll measurements should be normalized to fresh weight rather than seedling number due to the dwarfism observed in the overexpressor lines.

      Thank you for pointing out your concerns. We added statistical analysis based on at least two independent experiments, including Figures 1 and 7, to the original manuscript. In Figure 8 in the original manuscript, chlorophyll measurements were normalized to fresh weight.

      (3) Typos: in Figure 3a it should be "min" not "mim"; in Supplementary Figure 3, the GFP and merge images are swapped.

      We apologize for these errors, and we have corrected them. Supplementary Figure 3 was replaced with new images and was included in Figure 3 in the revised manuscript.

      Reviewer  #2 (Recommendations for the authors):

      (1) The abstract states "How this process is prevented to ensure proper plant growth has not been determined." The authors can be the first to do this, by adding graphical data on the height difference in hSfA2-arabidopis and wild-type Arabidopsis.

      Thank you and agree with you. We have added this information to the new working model (Figure 8)

      (2) The authors claim that Arabidopsis S-HsfA2, S-HsfA4c, and S-HsfB1; but have used S-HsfA2 to understand the action. The mechanisms being unknown for S-HsfA4c, and S-HsfB1 cannot be represented by S-HsfA2 to represent the mechanism.

      Thank you for your valuable comments. In this study, we used S-HsfA2, S-HsfA4c, and S-HsfB1 data to support the view that “splice variants of HSFs generate new plant HSFs”. We also noted that S-HsfA2 cannot represent a traditional S-HSF because S-HsfA4c and S-HsfB1 may have functions other than S-HsfA2. Therefore, we deleted “representative S-HSF” from the revised manuscript. In the future, we will conduct in-depth research on the relevant mechanisms of S-HsfA4c and S-HsfB1 under your guidance.

      (3) The authors can include which of the HSFs interacted with other genes of Arabidopsis reported by other researchers are positively or negatively regulated in heat response/ growth or the balance.

      In the introduction section, we included these genes. AtHsfA2, AtHsfA3, and BhHsf1 confer heat tolerance in Arabidopsis but also result in a dwarf phenotype in plants (Ogawa et al., 2007; Yoshida et al., 2008; Zhu et al., 2009).

      (4) The authors have started from the subsection plant materials and growth conditions. It is unclear from where the authors have found these HSF mutant Arabidopsis? Is it a continuation of some other work? As a reader, I am utterly confused because of the arrangement of the materials and methods section.

      We apologize for the lack of detailed information in the Materials and methods section. These mutants were purchased from AraShare (Fuzhou, China) and verified via PCR and RT‒qPCR. We added the missing information.

      (5) Is the DAS - Days After Sowing - represented as a graph or table? This will add data to the plant growth section to clearly state the difference between the mutants and the wild-type.

      In this study, the age of the Arabidopsis seedlings was calculated as days after sowing (DAS), as stated in the Materials and Methods section and figure legends.

      (6) Heat stress treatment after gus staining looks absurd. Should it not follow after plant materials and growth conditions, which should ideally be after the plant transformation and cloning section? The initial step is definitely about plasmid construction. Kindly rearrange.

      Thank you for your valuable suggestions. We have rearranged the logical order of the materials and methods.

      (7) The expression of GFP and RFP was not clearly seen in the images. This could be because of the poor resolution of the images added.

      We obtained high-quality images of S-HsfA2-GFP (Figure 3 in the revised manuscript).

      (8) We live in an age where it is widely known that genes are not functioning independently but are coregulated and coregulate other proteins. The authors can address the role of these spliced variants on gene regulation and compare them with the HSFs.

      We agree with your suggestion. In this study, HSP17.6B was identified as a direct gene of S-HsfA2 and HsfA2, which can partly explain the role of S-HsfA2 in heat resistance and growth balance. However, the mechanical mechanism by which S-HsfA2 regulates heat tolerance and growth balance may not be limited to HSP17.6B. On the basis of the current data, we propose that the putative S-HsfA2-DERB2A-HsfA3 module might be associated with the roles of S-HsfA2 in heat tolerance and growth balance. Please refer to the discussion section for a detailed explanation.

      (9) Regulatory elements can be validated in relation to their interaction with proven HSFs.

      Supplemental Figure S3 shows that His6-HsfA2 failed to bind to the HRE in vitro.

      (10) The authors seem to be biased toward heat stress and have not worked enough on plant growth. Biochemical data and images on plant growth could be added to bring out the novelty of this manuscript.

      Thank you for your suggestion. We added new data indicating that, compared with the wild-type control, S-HsfA2-GFP, S-HsfA4c-GFP, or S-HsfB1-GFP overexpression inhibited root length (Supplemental Figure 8).

      (11) Line 251 on page 11 of the submitted manuscript says that the s-Hsfs were previously identified by Liu et al. (2013) yet in the abstract the authors claim that these s-HsFs are NEW kinds of HSF with a unique truncated DNA-binding domain (tDBD) that binds a NEW heat-regulated element (HRE).

      In our previous report, several S-HSFs, including S-HsfA2, S-HsfA4, S-HsfB1, and S-HsfB2a, were identified primarily in Arabidopsis (Liu et al., 2013). In this study, we further characterized S-HsfA2, S-HsfA4, and S-HsfB1 and revealed several features of S-HSFs. Therefore, we claim that these S-HSFs are new kinds of HSFs.

      (12) What are these NEW kinds of HRE? Which genes have these HRE? Was an in silico study conducted to study it or can any reports can be cited?

      HREs, i.e., heat-regulated elements, are newly identified heat-responsive elements in this study. The sequences of HREs are partially related to traditional heat shock elements (HSEs). Because we did not identify the essential nucleic acids required for t-DBD binding to the HRE, we did not perform an in silico study.

      (13) S-HSFs may interact with existing HSFs. Have the authors thought in this direction? It can have a role in positively regulating other sHSFs or regulating multiple expressing genes related to plant growth and other functions. This needs to be explored.

      Thank you for this point. Given that the overexpression of Arabidopsis HsfA2 or HsfA3 inhibits growth under nonstress conditions, we discussed this direction from the perspective of the interaction of S-HsfA2 with HsfA2 or HsfA3 in the revised manuscript.

      (14) The authors need to concentrate on the presentation and arrangement of both their materials and methods and result section and write them in a systematic manner (or following a workflow).

      The materials, methods and results sections are arranged in logical order.

      (15) The authors have used references in the results section which can be added to the discussion section to make it more accurate.

      Thank you for your suggestions. We have moved some references to the discussion section, but the necessary references remain in the results section.

    1. Author response:

      We sincerely thank the reviewers for the time and care they have invested in evaluating our manuscript. We greatly appreciate their thoughtful feedback, which highlights both the strengths and the areas where the work can be improved. We recognize the importance of the concerns raised, particularly regarding the TMS analyses and interpretation, as well as aspects of the manuscript structure and clarity. The authors are committed to transparency and a rigorous scientific process, and we will therefore carefully consider all reviewer comments. In the coming months, we will revise the manuscript to incorporate additional analyses, provide clearer methodological detail, and refine the interpretation of the stimulation results.

    1. Author response:

      eLife Assessment:

      This paper performs a valuable critical reassessment of anatomical and functional data, proposing a reclassification of the mouse visual cortex in which almost all the higher visual areas are consolidated into a single area V2. However, the evidence supporting this unification is incomplete, as the key experimental observations that the model attempts to reproduce do not accurately reflect the literature . This study will likely be of interest to neuroscientists focused on the mouse visual cortex and the evolution of cortical organization.

      We do not agree or understand which 'key experimental observations' that the model attempts to reproduce do not accurately reflect the literature. The model reproduces a complete map of the visual field, with overlap in certain regions. When reversals are used to delineate areas, as is the current custom, multiple higher order areas are generated, and each area has a biased and overlapping visual field coverage. These are the simple outputs of the model, and they are consistent with the published literature, including recent publications such as Garrett et al. 2014 and Zhuang et al. 2017, a paper published in this journal. The area boundaries produced by the model are not identical to area boundaries in the literature, because the model is a simplification.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1) Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      We agree the revised manuscript would benefit from a clear discussion of updated rules of area delineation in the mouse. In brief, we argue that retinotopy alone should not be used to delineate area boundaries in mice, or any other species. Although there is some evidence for functional property, architecture, and connectivity changes across mouse HVAs, area boundaries continue to be defined primarily, and sometimes solely (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017), based on retinotopy. We acknowledge that earlier work (Wang and Burkhalter, 2007; Wang et al., 2011) did consider cytoarchitecture and connectivity alongside retinotopy, but more recent work has shifted to a focus on retinotopy as indicated by the currently accepted criterion for area delineation.  

      As reviewer #2 points out, the present criteria for mouse visual area delineation can be found in the Methods section of: [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014)].

      Criterion 1: Each area must contain the same visual field sign at all locations within the area.

      Criterion 2: Each visual area cannot have a redundant representation of visual space.

      Criterion 3: Adjacent areas of the same visual field sign must have a redundant representation.

      Criterion 4: An area's location must be consistently identifiable across experiments.

      As discussed in the manuscript, recent evidence in higher order visual cortex of tree shrews and rats led us to question the universality of these criteria across species. Specifically, tree shrew V2, macaque V2, and marmoset DM, exhibit reversals in visual field-sign in what are defined as single visual areas. This suggests that criterion 1 should be updated. It also suggests that Criterion 2 and 3 should be updated since visual field sign reversals often co-occur with retinotopic redundancies, since reversing course in the direction of progression along the visual field can easily lead to coverage of visual field regions already traveled.  

      More broadly, we argue that topography is just one of several criteria that should be considered in area delineation. We understand that few visual areas in any species meet all criteria, but we emphasize that topography cannot consistently be the sole satisfied criterion – as it currently appears to be for many mouse HVAs. Inspired by a recent perspective on cortical area delineation (Petersen et al., 2024), we suggest the following rules, that will be worked into the revised version of the manuscript. Topography is a criterion, but it comes after considerations of function, architectonics and connectivity.

      (1) Function—Cortical areas differ from neighboring areas in their functional properties  

      (2) Architectonics—Cortical areas often exhibit distinctions from neighboring areas in multiple cyto- and myeloarchitectonic markers

      (3) Connectivity—Cortical areas are characterized by a specific set of connectional inputs and outputs from and to other areas

      (4) Topography—Cortical areas often exhibit a distinct topography that balances maximal coverage of the sensory field with minimal redundancy of coverage within an area.

      As we discuss in the manuscript, although there are functional, architectonic, and connectivity differences across mouse HVAs, they typically vary smoothly across multiple areas – such that neighboring areas share the same properties and there are no sharp borders. For instance, sharp borders in cytoarchitecture are generally lacking in the mouse HVAs. A notable exceptions to this is the clear and sharp change in m2AChR expression that occurs between LM and AL (Wang et al., 2011). 

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      Thank you for the correction. We will revise the text to discuss (Glickfeld et al., 2013), as it remains some of the strongest evidence in favor of retinotopy-independent functional specialization of mouse HVAs. However, one caveat of this study is the size of the V1 injection that is the source of axons studied in the HVAs. As apparent in Figure 1B, the large injection covers nearly a quarter of V1. It is worth nothing that (Han et al., 2018) found, using single-cell reconstructions and MAPseq, that the majority of V1 neurons project to multiple nearby HVA targets. In this experiment the tracing does not suffer from the problem of spreading over V1’s retinotopic map, and suggests that, presumably retinotopically matched, locations in each area receive shared inputs from the V1 population rather than a distinct but spatially interspersed subset. In fact, the authors conclude “Interestingly, the location of the cell body within V1 was predictive of projection target for some recipient areas (Extended Data Fig. 8). Given the retinotopic organization of V1, this suggests that visual information from different parts of visual field may be preferentially distributed to  specific target areas, which is consistent with recent findings (Zhuang et al., 2017)”. Given an injection covering a large portion of the retinotopic map, and the fact that feed-forward projections from V1 to HVAs carry coarse retinotopy - it is difficult to prove that functional specializations noted in the HVA axons are retinotopyindependent. This would require measurement of receptive field location in the axonal boutons, which the authors did not perform (possibly because the SNR of calcium indicators prevented such measurements at the time).  

      Another option would be to show that adjacent neurons in V1, that project to far-apart HVAs, exhibit distinct functional properties on par with differences exhibited by neurons in very different parts of V1 due to retinotopy. In other words, the functional specificity of V1 inputs to HVAs at retinotopically identical locations is of the same order as those that might be gained by retinotopic biases. To our knowledge, such a study has not been conducted, so we have decided to measure the data in collaboration with the Allen Institute. As part of the Allen Institute’s pioneering OpenScope project, we will make careful two-photon and electrophysiology measurements of functional properties, including receptive field location, SF, and TF in different parts of the V1 retinotopic map. Pairing this data with existing Allen Institute datasets on functional properties of neurons in the HVAs will allow us to rule in, or rule-out, our hypotheses regarding retinotopy as the source of functional specialization in mouse HVAs. We will update the discussion in the revised manuscript to better reflect the need for additional evidence to support or refute our proposal.

      Meier AM et al., bioRxiv 2025 (Meier et al., 2025) was published after our submission, but we are thankful to the reviewers for guiding our attention to this timely paper. Given the recent findings on the influence of locomotion on rodent and primate visual cortex, it is very exciting to see clearly specialized circuits for processing self-generated visual motion in V1. However, it is difficult to rule out the role of retinotopy as the HVA areas (LM, AL, RL) participating in the M2+ network less responsive to self-generated visual motion exhibit a bias for the medial portion of the visual field and the HVA area (PM) involved in the M2- network responsive to self-generated visual motion exhibit a bias for the lateral (or peripheral) parts of the visual field. For instance, a peripheral bias in area PM has been shown using retrograde tracing as in Figure 6 of (Morimoto et al., 2021), single-cell anterograde tracing  as in Extended Data Figure 8 of (Han et al., 2018), and functional imaging studies (Zhuang et al., 2017). Recent findings in the marmoset also point to visual circuits in the peripheral, but not central, visual field being significantly modulated by selfgenerated movements (Rowley et al., 2024). 

      However, a visual field bias in area PM that selectively receive M2- inputs is at odds with the clear presence of modular M2+/M2- patches across the entire map of V1 (Ji et al., 2015).  One possibility supported by existing data is that neurons in M2- patches, as well as those in M2+ patches, in the central representation of V1 make fewer or significantly weaker connections with area PM compared to areas LM, AL and RL. Evidence to the contrary would support retinotopy-independent and functionally specialized inputs from V1 to HVAs.

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

      We understand that such a proposal would combine a subset of areas with matched field sign (LM, P, PM, and RL) would be less extreme and received better by the community. This would create a V2 with a smooth map without reversals or significant redundant retinotopic coverage. However, the intuition we have built from our modeling studies suggest that both these areas, and the other smaller areas with negative field sign (AL, AM, LI), are a byproduct of a complex single map of the visual field that exhibits reversals as it contorts around the triangular and tear-shaped boundaries of V1. In other words, we believe the redundant coverage and field-sign changes/reversals are a byproduct of a single secondary visual field in V2 constrained by the cortical dimensions of V1. That being said, we understand that area delineations are in part based on a consensus by the community. Therefore we will continue to discuss our proposal with community members, and we will incorporate new evidence supporting or refuting our hypothesis, before we submit our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      We would not claim to have written the final chapter of the saga. Our goal was to add an important piece of new evidence to the discussion of area delineations across species. We believe this new evidence supports our unification hypothesis.  We also believe that there are several missing pieces of data that could support or refute our hypothesis. We have begun a collaboration to collect some of this data.  

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      This brings up a nuanced issue regarding visual field coverage. Wang & Burkhalter, 2007 Figure 6 shows the receptive field of sample neurons in area LM that cover the full range between 0 and 90 degrees of azimuth, and -40 to 80 degree of elevation – which essentially matches the visual field coverage in V1. However, we do not know whether these neurons are representative of most neurons in area LM. In other words, while these single-cell recordings along selected contours in cortex show the span of the visual field coverage, they may not be able to capture crucial information about its shape, missing regions of the visual field or potential bias. To mitigate this, visual field maps measured with electrophysiology are commonly produced by even sampling across the two dimensions of the visual area, either by moving a single electrode along a grid-pattern (e.g. (Manger et al., 2002)), or using a grid-liked multi-electrode probe (e.g. (Yu et al., 2020)). This was not carried out either in Wang & Burkhalter 2007 or Wang et al. 2011.  Even sampling of cortical space is time consuming and difficult with electrophysiology, but efficient with functional imaging. Therefore, despite the likely under-estimation of visual field coverage, imaging techniques are valuable in that they can efficiently exhibit not only the span of the visual field of a cortical region, but also its shape and bias.  

      Multiple functional imaging studies that simultaneously measure visual field coverage in V1 and HVAs report a bias in the coverage of HVAs, relative to that in V1 (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017). While functional imaging will likely underestimate receptive fields compared to electrophysiology, the consistent observation of an orderly bias for distinct parts of the visual field across the HVAs suggests that at least some of the HVAs do not have full and uniform coverage of the visual field comparable to that in V1. For instance, (Garrett et al., 2014) show that the total coverage in HVAs, when compared to V1, is typically less than half (Figure 6D) and often irregularly shaped.

      Careful measurements of single-cell receptive fields, using mesoscopic two-photon imaging across the HVAs would settle this question. As reviewer #1 points out, this is technically feasible, though no dataset of this kind exists to our knowledge.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      We wrote that “Future work showed boundaries in labeling of histological markers such as SMI-32 and m2ChR labeling, but such changes mostly delineated area LM/AL (Wang et al., 2011) and seemed to be correlated with the representation of the lower visual field.” The latter statement regarding the representation of the lower visual field is directly referencing the data in Figure 1 of (Wang et al., 2011), which is titled “Figure 1: LM/AL border identified by the transition of m2AChR expression coincides with receptive field recordings from lower visual field.” Similar to the Wang et al., we were simply referring to the fact that the border of area LM/AL co-exhibits a change in m2AChR expression as well as lower-visual field representation.  

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      Thanks for this correction. By “anatomical boundaries of higher visual cortex”, we meant the cortical boundary between V1 and higher order visual areas on one end, and the outer edge of the envelope that defines the functional boundaries of the HVAs in cortical space (Zhuang et al., 2017). The reviewer is correct that we should have referred to these as functional boundaries. The word ‘anatomical’ was meant to refer to cortical space, rather than visual field space.

      More generally though, there is no disagreement between the partitioning of visual cortex in Figure 1 and 2. Rather, the portioning in Figure 1 is directly taken from Zhuang et al., (2017) whereas those in Figure 2 are produced by mathematical model simulation. As such, one would not expect identical areal boundaries between Figure 2 and Figure 1. What we aimed to communicate with our modeling results, is that a single area can exhibit multiple visual field reversals and retinotopic redundancies if it is constrained to fit around V1 and cover a visual field approximately matched to the visual field coverage in V1. We defined this area explicitly as a single area with a single visual field (boundaries shown in Figure 2A). So  the point of our simulation is to show that even an explicitly defined single area can appear as multiple areas if it is constrained by the shape of mouse V1, and if visual field reversals are used to indicate areal boundaries. As in most models, different initial conditions and parameters produce a complex visual field which will appear as multiple HVAs when delineated by areal boundaries. What is consistent however, is the existence of complex single visual field that appears as multiple HVAs with partially overlapping coverage.

      Similarly, we would not expect a simple model to exactly reproduce the multi-color tracer injections in Wang and Burkhalter (2007). However, we find it quite compelling that the model can produce multiple groups of multi-colored axonal projections beyond V1 that can appear as multiple areas each with their own map of the visual field using current criteria, when the model is explicitly designed to map a single visual field. We will explain the results of the model, and their implications, better in the revised manuscript.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      This is a good point. We do not expect that expanding the coverage of V1 will change the results of the model significantly. However, for the revised manuscript, we will update V1 coverage to be accurate, repeat our simulations, and report the results.  

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      The reviewer is correct. In the revised manuscript, we will be careful to distinguish bias in visual field coverage across areas from presence or lack of visual field overlap.  

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

      This is an interesting suggestion, and careful visual topography data from rabbits and other lateral eyed animals would help to evaluate it. For what it’s worth, tree shrews are lateral eyed animals with only 50 degrees of binocular visual field and also show V2 subregions.

      Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

      Thank you for this correction, we admit we should have chosen our words more carefully. In the revised manuscript, we will emphasize that higher order visual areas in the mouse do have some overlap in their representations but also exhibit bias in their coverage. This is consistent with our proposal and in fact our model simulations in Figure 2E also show overlapping representations along with differential bias in coverage. However, we also note Figure 6 of Garret et al. 2014 provides several pieces of evidence in support of our proposal that higher order areas are sub-regions of a single area V2. Specifically, the visual field coverage of each area is significantly less than that in V1 (Garret et al. 2014, Figure 6D). While the imaging methods used in Garret et al. likely under-estimate receptive fields, one would assume they would similarly impact measurements of coverage in V1 and HVAs. Secondly, each area exhibits a bias towards a different part of the visual field (Figure 6C and E), that this bias is distinct for different areas but proceeds in a retinotopic manner around V1 - with adjacent areas exhibiting biases for nearby regions of the visual field (Figure 6E). Thus, the biases in the visual field coverage across HVAs appear to be related and not independent of each other. As we show in our modeling and in Figure 2, such orderly and inter-related biases can be created from a single visual field constrained to share a border with mouse V1.   

      With regards to the existing definition of a single area: we did not ignore the requirement that single areas cannot have redundant representations of the visual field. Rather, we believe that this requirement should be relaxed considering new evidence collected from other species, where multiple visual field reversals exist within the same visual area. We understand this issue is nuanced and was not made clear in the original submission.  

      In the revised manuscript, we will clarify that visual field reversals often exhibit redundant retinotopic representation on either side of the reversal. In the revised manuscript we will clarify that our argument that multiple reversals can exist within a single visual area in the mouse, is an argument that some retinotopic redundancy can exist with single visual areas. Such a re-classification would align how we define visual areas in mice with existing classification in tree shrews, ferrets, cats, and primates – all of whom have secondary visual areas with complex retinotopic maps exhibiting multiple reversals and redundant retinotopic coverage.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Thanks for the kind words!

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      We sincerely thank the reviewer for their thoughtful and generous comments. We are especially pleased that the core strengths of the model—its stimulus-computable architecture, biological grounding, modularity, and cross-domain applicability—were clearly recognized. As the reviewer rightly notes, removing researcher-defined abstractions and working directly from naturalistic stimuli opens the door to uncovering previously overlooked dynamics in complex multisensory signals, such as the spatial and temporal richness of audiovisual speech.

      We also appreciate the recognition of the model’s origins in a simple organism and its generalization across species and behaviors. This phylogenetic continuity reinforces our view that the MCD captures a fundamental computation with wide-ranging implications. Finally, we are grateful for the reviewer’s emphasis on the model’s predictive power across tasks and datasets with few or no free parameters—a property we see as key to both its parsimony and explanatory utility.

      We have highlighted these points more explicitly in the revised manuscript, and we thank the reviewer for their generous and insightful endorsement of the work.

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      The reviewer is right: the current version of the manuscript only provides limited information about parameter fitting. In the revised version of the manuscript, we included a parameter estimation and generalizability section that includes all information requested by the reviewer.

      To test whether using the MSE instead of Pearson correlation led to a similar estimated set of parameter values, we repeated the fitting using the MSE. The parameter estimated with this method (TauV, TauA, TauBim) closely followed those estimated using Pearson correlation (TauV, TauA, TauBim). Given the similarity of these results, we have chosen not to include further figures, however this analysis is now included in the new section (pages 23-24).

      Regarding the permutation test, it is expected that different stimuli produce analogous psychometric functions: after all, all studies relied on stimuli containing identical manipulation of lags. As a result, MCD population responses tend to be similar across experiments. Therefore, it is not a surprise that the permuted distribution of MCD-data correlation in Supplementary Figure 1K has a mean as high as 0.97. However, what is important is to demonstrate that the non-permuted dataset has an even higher goodness of fit. Supplementary Figure 1K demonstrates that none of the permuted stimuli could outperform the non-permuted dataset; the mean of the non-permuted distribution is 4.7 (standard deviations) above the mean of the already high  permuted distribution.

      We believe the new section, along with the present response, fully addresses the legitimate concerns of the reviewer.

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      Fitting or predicting behaviour does not in itself demonstrate that a model captures the underlying neural computations—though it may offer valuable constraints and insights. In line with this, we were careful not to extrapolate the implications of our simulations to specific neural mechanisms.

      Temporal sensitivity is, by definition, a behavioural metric, and—as the reviewer correctly notes—its estimation may reflect a range of contributing factors beyond low-level sensory processing, including attention, motivation, and lapse rates (i.e., stimulus-independent errors). In Equation 21, we introduced a lapse parameter specifically to account for such effects in the context of monkey eye-tracking data. For the rat datasets, however, the inclusion of a lapse term was not required to achieve a close fit to the psychometric data (ρ = 0.981). While it is likely that adding a lapse component would yield a marginally better fit, the absence of single-trial data prevents us from applying model comparison criteria such as AIC or BIC to justify the additional parameter. In light of this, and to avoid unnecessary model complexity, we opted not to include a lapse term in the rat simulations.

      With respect to the pharmacological manipulation data, we acknowledge the reviewer’s point that observed changes in slope and bias could plausibly arise from alterations at either the sensory or decisional level—or both. In our model, low-level sensory processing is instantiated by the MCD architecture, which outputs the MCDcorr and MCDlag signals that are then scaled and integrated during decision-making. Importantly, this scaling operation influences the slope of the resulting psychometric functions, such that changes in slope can arise even in the absence of any change to the MCD’s temporal filters. In our simulations, the temporal constants of the MCD units were fixed to the values estimated from the non-pharmacological condition (see parameter estimation section above), and only the decision-related parameters were allowed to vary. From this modelling perspective, the behavioural effects observed in the pharmacological datasets can be explained entirely by changes at the decisional level. However, we do not claim that such an explanation excludes the possibility of genuine sensory-level changes. Rather, we assert that our model can account for the observed data without requiring modifications to early temporal tuning.

      To rigorously distinguish sensory from decisional effects, future experiments will need to employ stimuli with richer temporal structure—e.g., temporally modulated sequences of clicks and flashes that vary in frequency, phase, rhythm, or regularity (see Fujisaki & Nishida, 2007; Denison et al., 2012; Parise & Ernst, 2016, 2025; Locke & Landy, 2017; Nidiffer et al., 2018). Such stimuli engage the MCD in a more stimulus-dependent manner, enabling a clearer separation between early sensory encoding and later decision-making processes. Unfortunately, the current rat datasets—based exclusively on single click-flash pairings—lack the complexity needed for such disambiguation. As a result, while our simulations suggest that the observed pharmacologically induced effects can be attributed to changes in decision-level parameters, they do not rule out concurrent sensory-level changes.

      In summary, our results indicate that changes in the temporal tuning of MCD units are not necessary to reproduce the observed pharmacological effects on audiovisual timing behaviour. However, we do not assert that such changes are absent or unnecessary in principle. Disentangling sensory and decisional contributions will ultimately require richer datasets and experimental paradigms designed specifically for this purpose. We have now modified the results section (page 6) and the discussion (page 11) to clarify these points.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      We appreciate the reviewer’s thoughtful engagement with the concept of stimulus computability. We agree that the term requires careful definition and should not be taken as a guarantee of perceptual insight or neural plausibility. In our work, we define a model as “stimulus-computable” if all its inputs are derived directly from the stimulus, rather than from experimenter-defined summary descriptors such as temporal lag, spatial disparity, or cue reliability. In the context of multisensory integration, this implies that a model must account not only for how cues are combined, but also for how those cues are extracted from raw inputs—such as audio waveforms and visual contrast sequences.

      This distinction is central to our modelling philosophy. While ideal observer models often specify how information should be combined once identified, they typically do not address the upstream question of how this information is extracted from sensory input. In that sense, models that are not stimulus-computable leave out a key part of the perceptual pipeline. We do not present stimulus computability as a marker of theoretical superiority, but rather as a modelling constraint that is necessary if one’s aim is to explain how structured sensory input gives rise to perception. This is a view that is also explicitly acknowledged and supported by Reviewer 2.

      Framed in Marr’s (1982) terms, non–stimulus-computable models tend to operate at the computational level, defining what the system is doing (e.g., computing a maximum likelihood estimate), whereas stimulus-computable models aim to function at the algorithmic level, specifying how the relevant representations and operations might be implemented. When appropriately constrained by biological plausibility, such models may also inform hypotheses at the implementational level, pointing to potential neural substrates that could instantiate the computation.

      Regarding the reviewer’s example illustrating a limitation of the MCD model, we respectfully note that the account appears to be based on a misreading of our prior work. In Parise & Ernst (2025), where we simulated the stimuli from Nidiffer et al. (2018), the MCD model reproduced participants’ behavioural data without any human intervention or adjustment. The model was applied in a fully bottom-up, stimulus-driven manner, and its output aligned with observer responses as-is. We suspect the confusion may stem from analyses shown in Figure 6 - Supplement Figure 5 of Parise & Ernst (2025), where we investigated the lack of a frequency-doubling effect in the Nidiffer et al. data. However, those analyses were based solely on the Pearson correlation between auditory and visual stimulus envelopes and did not involve the MCD model. No manual exclusion of onset/offset events was applied, nor was the MCD used in those particular figures. We also note that Parise & Ernst (2025) is a separate, already published study and is not the manuscript currently under review. 

      In summary, while we fully agree that stimulus computability does not resolve all the complexities of multisensory perception (see comments below about speech), we maintain that it provides a valuable modelling constraint—one that enables robust, generalisable predictions when appropriately scoped. 

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      We thank the reviewer for their thoughtful comments, and especially for the kind words describing the range of findings from our AV speech simulations as “very convincing.”

      We would like to clarify that it is not our view that speech perception can be reduced to energetic correlation detection. While the MCD model captures low- to mid-level temporal dependencies between auditory and visual signals, we fully agree that a complete account of audiovisual speech perception must also include higher-order processes—including linguistic mechanisms and top-down predictions. These are critical components of AV speech comprehension, and lie beyond the scope of the current model.

      Our use of the term “end-to-end” is intended in a narrow operational sense: the model transforms raw audiovisual input (i.e., audio waveforms and video frames) directly into behavioural output (i.e., button press responses), without reliance on abstracted stimulus parameters such as lag, disparity or reliability. It is in this specific technical sense that the MCD offers an end-to-end model. We have revised the manuscript to clarify this usage to avoid any misunderstanding.

      In light of the reviewer’s valuable point, we have now edited the Discussion to acknowledge the importance of linguistic processes (page 13) and to clarify what we mean by end-to-end account (page 11). We agree that future work will need to explore how stimulus-computable models such as the MCD can be integrated with broader frameworks of linguistic and predictive processing (e.g., Summerfield, 1987; Campbell, 2008; Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155

      Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y

      Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841

      Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006

      Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.

      Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

      Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      We thank the reviewer for their positive evaluation of the manuscript, and particularly for highlighting the significance of the model's stimulus-computable architecture and its broad applicability across species and paradigms. Please find our responses to the individual points below.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      We fully agree with the reviewer that a comparison between the correlation-based MCD model and Bayesian accounts is valuable—particularly for clarifying how the two frameworks differ conceptually and where they may converge.

      As noted in the revised manuscript, the key distinction lies in the level of analysis described by Marr (1982). Bayesian models operate at the computational level, describing what the system is aiming to compute (e.g., optimal cue integration). In contrast, the MCD functions at the algorithmic level, offering a biologically plausible mechanism for how such integration might emerge from stimulus-driven representations.

      In this context, the MCD provides a concrete, stimulus-grounded account of how perceptual estimates might be constructed—potentially implementing computations with Bayesian-like characteristics (e.g., reduced uncertainty, cue weighting). Thus, the two models are not mutually exclusive but can be seen as complementary: the MCD may offer an algorithmic instantiation of computations that, at the abstract level, resemble Bayesian inference.

      We have now updated the manuscript to explicitly highlight this relationship (pages 2 and 11). In the revised manuscript, we also included a new figure (Figure 5) and movie (Supplementary Movie 3), to show how the present approach extends previous Bayesian models for the case of cue integration (i.e., the ventriloquist effect).

      (2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction for extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      We thank the reviewer for this insightful comment: extending the MCD model to include more than two sensory modalities is a natural and valuable next step. Indeed, one of the strengths of the MCD framework lies in its modularity. Let us consider the MCDcorr​ output (Equation 6), which is computed as the pointwise product of transient inputs across modalities. Extending this to include a third modality, such as touch, is straightforward: MCD units would simply multiply the transient channels from all three modalities, effectively acting as trimodal coincidence detectors that respond when all inputs are aligned in time and space.

      By contrast, extending MCDlag is less intuitive, due to its reliance on opponency between two subunits (via subtraction). A plausible solution is to compute MCDlag in a pairwise fashion (e.g., AV, VT, AT), capturing relative timing across modality pairs.

      Importantly, the bulk of the spatial integration in our framework is carried by MCDcorr, which generalises naturally to more than two modalities. We have now formalised this extension and included a graphical representation in a supplementary section of the revised manuscript.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

      Reviewer #1 (Recommendations for the authors):

      Recommendations:

      My biggest concern is a lack of specificity about model fitting, which is assuaged by the inclusion of sufficient detail to replicate the analysis completely or the inclusion of the analysis code. The code availability indicates a script for the population model will be included, but it is unclear if this code will provide the fitting details for the whole of the analysis.

      We thank the reviewer for raising this important point. A new methodological section has been added to the manuscript, detailing the model fitting procedures used throughout the study. In addition, the accompanying code repository now includes MATLAB scripts that allow full replication of the spatiotemporal MCD simulations.

      Perhaps it could be enlightening to re-evaluate the model with a measure of error rather than correlation? And I think many researchers would be interested in the model's performance on unseen data.

      The model has now been re-evaluated using mean squared error (MSE), and the results remain consistent with those obtained using Pearson correlation. Additionally, we have clarified which parts of the study involve testing the model on unseen data (i.e., data not used to fit the temporal constants of the units). These analyses are now included and discussed in the revised fitting section of the manuscript (pages 23-24).

      Otherwise, my concerns involve the interpretation of findings, and thus could be satisfied with minor rewording or tempering conclusions.

      The manuscript has been revised to address these interpretative concerns, with several conclusions reworded or tempered accordingly. All changes are marked in blue in the revised version.

      Miscellanea:

      Should b0 in equation 10 be bcrit to match the below text?

      Thank you for catching this inconsistency. We have corrected Equation 10 (and also Equation 21) to use the more transparent notation bcrit instead of b0, in line with the accompanying text.

      Equation 23, should time be averaged separately? For example, if multiple people are speaking, the average correlation for those frames will be higher than the average correlation across all times.

      We thank the reviewer for raising this thoughtful and important point. In response, we have clarified the notation of Equation 23 in the revised manuscript (page 20). Specifically, we now denote the averaging operations explicitly as spatial means and standard deviations across all pixel locations within each frame.

      This equation computes the z-score of the MCD correlation value at the current gaze location, normalized relative to the spatial distribution of correlation values in the same frame. That is, all operations are performed at the frame level, not across time. This ensures that temporally distinct events are treated independently and that the final measure reflects relative salience within each moment, not a global average over the stimulus. In other words, the spatial distribution of MCD activity is re-centered and rescaled at each frame, exactly to avoid the type of inflation or confounding the reviewer rightly cautioned against.

      Reviewer #2 (Recommendations for the authors):

      The authors have done a great job of providing a stimulus computable model of cue combination. I had just a few suggestions to strengthen the theoretical part of the paper:

      (1) While the authors have shown a good match between MCD and cue combination, some theoretical justification or equivalence analysis would benefit readers on how the two relate to each other. Something like Zhang et al. 2019 (which is for motion cue combination) would add to the paper.

      We agree that it is important to clarify the theoretical relationship between the Multisensory Correlation Detector (MCD) and normative models of cue integration, such as Bayesian combination. In the revised manuscript, we have now modified the introduction and added a paragraph in the Discussion addressing this link more explicitly. In brief, we see the MCD as an algorithmic-level implementation (in Marr’s terms) that may approximate or instantiate aspects of Bayesian inference.

      (2) Simulating cue combination for tasks that require integration of more than two cues (visual, auditory, haptic cues) would more strongly relate the correlation model to Bayesian cue combination. If that is a lot of work, at least discussing this would benefit the paper

      This point has now been addressed, and a new paragraph discussing the extension of the MCD model to tasks involving more than two sensory modalities has been added to the Discussion section.

    1. Author response:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      The referee’s assessment is based on a misunderstanding – these are certainly not the same experiment repeated twice with different answers. In the previous report of the OGT-C921Y mutant mice (Florence Authier, et al., Dis Model Mech. 2023), OGT and OGA mRNA/protein expression have been assessed in total brain protein extract from 3 months old male mice. In that study we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the mutant compared to WT controls. However, in our the current study (Figure S12), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice, which does not allow a direct comparison of the two studies. In the pre-frontal cortex, OGT protein levels are not changed while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly. The different outcomes of OGT protein levels in both total brain and prefrontal cortex could suggest regional differences in OGT protein levels/stability as OGT mRNA levels are increased in both cases. Three other brain regions (hippocampus, striatum and cerebellum) have now also been assessed for OGT mRNA/protein expression, supporting such regional differences in OGT protein levels and these data will be included in the new version of the manuscript.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a 1-5 year delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      We are currently performing these experiments so that they can be included in the version of record of this manuscript.

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work and will perform further experiments to explore whether these pathways are functionally affected. However, it is important to note that the inference that these proteins must themselves be O-GlcNAc modified is incorrect – indeed, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a significant delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      For clarity, we will remove panel A from Figure 8 in the version of record – this panel was only ever meant to represent a priori hypotheses for OGT-CDG mechanisms, none of which have been either excluded or confirmed.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull, and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a 1-5 year delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this reviewer’s creative idea presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noted the inappropriate claim and have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in both the abstract and results sections. The stressed phenotype could be induced by heightened functional activity in the cells, potentially indicating higher cellular activity. The GnRH and CRH neurons discussed in this paper may represent such a case, as illustrated by the observed high serum GnRH, testosterone, and cortisol levels. This revision suggestion is highly valuable and constructive for our understanding of the unique physiological characteristics revealed by these data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype" in the abstract and results to avoid such claim. The stressed phenotype may be induced by heightened functional activity in the cells, potentially indicating higher cellular activity.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs from previous publications (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We thank the reviewers for their constructive feedback. Regarding the recommendations in the Public Reviews and Recommendations for Authors:

      a)  Physiological effects & CRH-senescence link:

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, consistent with findings in male mice (PMID: 33289482). This point has now been noted in the Introduction. We regret that further experimental validation of the treatment's physiological effects on aging in rats was beyond the scope of this study.

      b) Phenotype terminology:

      In response to concerns about the "senescent" characterization of CRH neurons, we have revised this terminology to "stressed phenotype" throughout the abstract and results. While we were unable to conduct additional experiments to confirm senescence markers, this revised description better reflects the heightened cellular activity observed (as evidenced by elevated serum GnRH and testosterone levels), without implying confirmed senescence.

      c) Glial cell analysis:

      To address questions about glial cell function during treatment, we have added new enrichment analysis data of differentially expressed genes in microglia and astrocytes from young (Y), old (O), and old treated (O.T) groups in Figure 2—figure supplement 3. This analysis reveals that microglia exhibit contrasting synaptic-related cellular processes compared to total neurons.

      d) Female and young controls:

      We sincerely apologize for the absence of female subjects and young male controls in the current study. The reviewers' suggestion to examine the male-specific effects of 17α-estradiol using female controls represents an excellent direction for future research, which we plan to pursue in upcoming studies.

      Reviewer #2 (Recommendations For The Authors):

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We acknowledge the error in gene clustering and have implemented the following corrections:

      (a) The description has been updated to state: 'Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors among all neurons.'

      (b) Genes not belonging to these three categories have been substantially removed.

      (c) The neuropeptide category (now including several growth hormones) has been expanded to 104 genes, while their corresponding receptors (including several sex hormone receptors) now comprise 105 genes.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse"

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was"

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Intriguingly, neurons expressing Ar and Esr1 ranked among the top 20 most perturbed receptor subtypes during aging (O vs Y), but were no longer ranked in this group following treatment (O.T vs Y and O.T vs O comparisons). This indicates that 17α-estradiol administration attenuated age-associated perturbation in these neuronal subtypes, which may be a consequence of potential feedback (Figure 3D). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      We appreciate the reviewer's scrutiny regarding lines 439-449. While these statements should not be interpreted as definitive conclusions from our current data, we propose they serve as clinically relevant discussion points worthy of exploration. Our findings demonstrate 17α-estradiol's role in modulating testosterone levels in aged males. This mechanistic insight warrants consideration of its therapeutic potential for age-related hypogonadism - a hypothesis we believe merits discussion given the compound's specific endocrine effects.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. We have refined this section to clarify that these points represent mechanistic hypotheses derived from our male data and existing literature, not conclusions about unstudied female physiology. This framing maintains the discussion's scientific value while respecting the study's scope.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      We thank the reviewer for highlighting the strengths of this work.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example), but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      We thank the reviewer for raising this crucial and insightful point regarding the framing of our results and the definitions of 'normative' and 'Bayesian' learning. Our primary aim in this work was to characterize specific behavioral signatures that demonstrate deviations from predictions generated by a strict, idealized Bayesian framework when learning from disinformation (which we term “biases”). We deliberately employed relatively simple Bayesian models as benchmarks to highlight these specific biases. We fully agree that more sophisticated Bayes-based models (as mentioned by the reviewer, or others) could potentially offer alternative mechanistic explanations for participant behavior. However, we currently do not have a strong notion about which Bayesian models can encompass our findings, and hence, we leave this important question for future work.

      To enhance clarity within the current manuscript we now avoided the use of the term “normative” to refer to our Bayesian models, using the term “ideal” instead. We also define more clearly what exactly we mean by that notion when the idea model is described:

      “This model is based on an idealized assumptions that during the feedback stage of each trial, the value of the chosen bandit is updated (based on feedback valence and credibility) according to Bayes rule reflecting perfect adherence to the instructed task structure (i.e., how true outcomes and feedback are generated).”

      Moreover, we have added a few sentences in the discussion commenting on how more complex Bayesian models might account for our empirical findings:

      “However, as hypothesized, when facing potential disinformation, we also find that individuals exhibit several important biases i.e., deviations from strictly idealized Bayesian strategies. Future studies should explore if and under what assumptions, about the task’s generative structure and/or learner’s priors and objectives, more complex Bayesian models (e.g., active inference (58)) might account for our empirical findings.”

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.

      We thank the reviewer for their valuable suggestion. We have now included more details about the experiment in the abstract:

      “In two experiments, participants completed a two-armed bandit task, where they repeatedly chose between two lotteries and received outcome-feedback from sources of varying credibility, who occasionally disseminated disinformation by lying about true choice outcome (e.g., reporting non reward when a reward was truly earned or vice versa).”

      One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      We appreciate the reviewer's thoughtful comment regarding the conceptualization of "normative" and "Bayesian" learning. We fully agree that the definition of "normative" is nuanced and can indeed depend on whether one considers reward-maximization or the underlying principles of belief updating. As explained above we now restrict our presentation to deviations from “ideal Bayes” learning patterns and we acknowledge the reviewer’s concern in a caveat in our discussion.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      We thank the reviewer for raising this interesting point regarding the presentation of the agent before the choice. Our decision to present the agent at this stage was intentional, as our original experimental design aimed to explore the possible effects of "expected source credibility" on participants' choices (e.g., whether knowledge of feedback credibility will affect choice speed and accuracy). However, we found nothing that would be interesting to report.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      Thanks for these additional important ideas which speak more to the notion that more complex Bayesian frameworks may account for biases we report.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      We included these questionnaires for exploratory purposes, with the aim of generating informed hypotheses for future research into individual differences in learning. Given the preliminary nature of these analyses, we believe further research is required about this important topic.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      We thank the reviewer for this important recommendation. We agree and this point is included in our caveat (cited above) that future research should address what assumptions about the generative task structure can allow Bayesian models to account for our empirical patterns.

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.<br /> Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      We thank the reviewer for these insightful suggestions speaking further to the point about more complex Bayesian models.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

      We agree with the reviewer that a task involving performance-based bonuses, and particularly one where participants are explicitly told they are being lied to, might elicit weak emotional response. However, our primary point is that the degree of these responses is expected to be substantially weaker than those typically observed in the broader disinformation literature, which frequently deals with highly salient political, social, or identity-related topics that inherently carry strong emotional and personal ties for participants, leading to much more pronounced affective engagement and potential biases. Our task deliberately avoids such issues thus minimizing the potential for significant emotion-driven biases. We have toned down the discussion accordingly:

      “This occurs even when the decision at hand entails minimal emotional engagement or pertinence to deep, identity-related, issues.”

      Reviewer #2 (Public review):

      This valuable paper studies the problem of learning from feedback given by sources of varying credibility. The solid combination of experiment and computational modeling helps to pin down properties of learning, although some ambiguity remains in the interpretation of results.

      Summary:

      This paper studies the problem of learning from feedback given by sources of varying credibility. Two banditstyle experiments are conducted in which feedback is provided with uncertainty, but from known sources. Bayesian benchmarks are provided to assess normative facets of learning, and alternative credit assignment models are fit for comparison. Some aspects of normativity appear, in addition to deviations such as asymmetric updating from positive and negative outcomes.

      Strengths:

      The paper tackles an important topic, with a relatively clean cognitive perspective. The construction of the experiment enables the use of computational modeling. This helps to pinpoint quantitatively the properties of learning and formally evaluate their impact and importance. The analyses are generally sensible, and parameter recovery analyses help to provide some confidence in the model estimation and comparison.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      (1) The approach in the paper overlaps somewhat with various papers, such as Diaconescu et al. (2014) and Schulz et al. (forthcoming), which also consider the Bayesian problem of learning and applying source credibility, in terms of theory and experiment. The authors should discuss how these papers are complementary, to better provide an integrative picture for readers.

      Diaconescu, A. O., Mathys, C., Weber, L. A., Daunizeau, J., Kasper, L., Lomakina, E. I., ... & Stephan, K. E. (2014). Inferring the intentions of others by hierarchical Bayesian learning. PLoS computational biology, 10(9), e1003810.

      Schulz, L., Schulz, E., Bhui, R., & Dayan, P. Mechanisms of Mistrust: A Bayesian Account of Misinformation Learning. https://doi.org/10.31234/osf.io/8egxh

      We thank the reviewers for pointing us to this relevant work. We have updated the introduction, mentioning these precedents in the literature and highlighting our specific contributions:

      “To address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37–41), our approach is novel in using one of its most popular tasks: the “bandit task”.”

      We also explain in the discussion how these papers relate to the current study:

      “Unlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72–75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on others’ actions or advice—whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72).”

      (2) It isn't completely clear what the "cross-fitting" procedure accomplishes. Can this be discussed further?

      We thank the reviewer for requesting further clarification on the cross-fitting procedure. Our study utilizes two distinct model families: Bayesian models and CA models. The credit assignment parameters from the CA models can be treated as “data/behavioural features” corresponding to how choice feedback affects choice-propensities. The cross fitting-approach allows us in effect to examine whether these propensity features are predicted from our Bayesian models. To the extent they are not, we can conclude empirical behavior is “biased”.

      Thus, in our cross-fitting procedure we compare the CA model parameters extracted from participant data (empirical features) with those that would be expected if our Bayesian agents performed the task. Specifically, we first fit participant behavior with our Bayesian models, then simulate this model using the best-fitted parameters and fit those simulations with our CA models. This generates a set of CA parameters that would be predicted if participants behavior is reduced to a Bayesian account. By comparing these predicted Bayesian CA parameters with the actual CA parameters obtained from human participants, the cross-fitting procedure allows us to quantitatively demonstrate that the observed participant parameters are indeed statistically significant deviations from normative Bayesian processing. This provides a robust validation that the biases we identify are not artifacts of the CA model's structure but true departures from normative learning.

      We also note that Reviewer 3 suggested an intuitive way to think about the CA parameters—as analogous to logistic regression coefficients in a “sophisticated regression” of choice on (recencyweighted) choice-feedback. We find this suggestion potentially helpful for readers. Under this interpretation, the purpose of the cross-fitting method can be seen simply as estimating the regression coefficients that would be predicted by our Bayesian agents, and comparing those to the empirical coefficients.

      In our manuscript we now explain this issues more clearly by explaining how our model is analogous to a logistic regression:

      “The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a “sophisticated” logistic regression, where the CA parameters take the role of “regression coefficients” corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.”

      We also explain our cross-fitting procedure in more detail:

      “To further characterise deviations between behaviour and our Bayesian learning models, we used a “crossfitting” method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participants’ best fitting parameters), but fitted these data using the CA-models, obtaining what we term “Bayesian-CA parameters” (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters).”

      (3) The Credibility-CA model seems to fit the same as the free-credibility Bayesian model in the first experiment and barely better in the second experiment. Why not use a more standard model comparison metric like the Bayesian Information Criterion (BIC)? Even if there are advantages to the bootstrap method (which should be described if so), the BIC would help for comparability between papers.

      We thank the reviewer for this important comment regarding our model comparison approach. We acknowledge that classical information criteria like AIC and BIC are widely used in RL studies. However, we argue our method for model-comparison is superior.

      We conducted a model recovery analysis demonstrating a significant limitation of using AIC or BIC for model-comparison in our data. Both these methods are strongly biased in favor of the Bayesian models. Our PBCM method, on the other hand, is both unbiased and more accurate. We believe this is because “off the shelf” methods like AIC and BIC rely on strong assumptions (such as asymptotic sample size and trial-independence) that are not necessarily met in our tasks (Data is finite; Trials in RL tasks depend on previous trials). PBCM avoids such assumptions to obtain comparison criteria specifically tailored to the structure and size of our empirical data. We have now mentioned this fact in the results section of the main text:

      “We considered using AIC and BIC, which apply “off-the shelf” penalties for model-complexity. However, these methods do not adapt to features like finite sample size (relying instead on asymptotic assumption) or temporal dependence (as is common in reinforcement learning experiments). In contrast, the parametric bootstrap cross-fitting method replaces these fixed penalties with empirical, data-driven criteria for modelselection. Indeed, model-recovery simulations confirmed that whereas AIC and BIC were heavily biased in favour of the Bayesian models, the bootstrap method provided excellent model-recovery (See Fig. S20).”

      We have also included such model recovery in the SI document:

      (4) As suggested in the discussion, the updating based on random feedback could be due to the interleaving of trials. If one is used to learning from the source on most trials, the occasional random trial may be hard to resist updating from. The exact interleaving structure should also be clarified (I assume different sources were shown for each bandit pair). This would also relate to work on RL and working memory: Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 10241035.

      We thank the reviewer for this point. The specific interleaved structure of the agents is described in the main text:

      “Each agent provided feedback for 5 trials for each bandit pair (with the agent order interleaved within the bandit pair).”

      As well as in the methods section:

      “Feedback agents were randomly interleaved across trials subject to the constraint that each agent appeared on 5-trials for each bandit pair.”

      We also thank the reviewer for mentioning the relevant work on working memory. We have now added it to our discussion point:

      “In our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the “continued-influence effect” whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60). One possible explanation is that some participants failed to infer that feedback from the 1-star agent was statistically void of information content, essentially random (e.g., the group-level credibility of this agent was estimated by our free-credibility Bayesian model as higher than 50%). Participants were instructed that this feedback would be “a lie” 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded. Notably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individual’s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).”

      (5) Why does the choice-repetition regression include "only trials for which the last same-pair trial featured the 3-star agent and in which the context trial featured a different bandit pair"? This could be stated more plainly.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a different context would constitute a striking demonstration of our “contrast effect”. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the samecontext condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all p’s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a “low credibility context” (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participants’ learning when it follows non-credible feedback, in the same learning context.”

      We have modified the discussion accordingly as well:

      “A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a “contrast effect”, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine is produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of ‘fake news’. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts , which may boost their impact.”

      And we have included some additional analyses in the SI document:

      “3.5 Contrast effects for contexts featuring a different bandit

      Given that we observed a contrast effect when both the learning and the immediately preceding "context trial” involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair – a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      𝑅𝐸𝑃𝐸𝐴𝑇 ~ 𝐶𝑂𝑁𝑇𝐸𝑋𝑇_𝑇𝑌𝑃𝐸 ∗ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.”

      (6) Why apply the "Truth-CA" model and not the Bayesian variant that it was motivated by?

      Thanks for this very useful suggestion. We are unsure if we fully understand the question. The Truth-CA model was not motivated by a new Bayesian model. Our Bayesian models were simply used to make the point that participants may partially discriminate between truthful and untruthful feedback (for a given source). This led to the idea that perhaps more credit is assigned for truth (than lie) trials, which is what we found using our Truth-CA model. Note we show that our Bayesian models cannot account for this modulation.

      We have now improved our "Truth-CA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our previous model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome).

      We have described this new model in the methods section:

      “Additionally, we formulated a “Truth-CA” model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see “Methods: Bayesian estimation of posterior belief that feedback is true”).”

      All relevant results have been updated accordingly in the main text:

      “To formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the “Truth-CA” model) to the data. This variant works as our Credibility-CA model but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule: 𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹 where 𝑇𝐵 is the free parameter representing the truth bonus, and 𝑃(𝑡𝑟𝑢𝑡ℎ) is the probability the received feedback being true (from the experimenter’s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenter’s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this ‘oracle model’ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood. Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.“

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      “We next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the “Truth-CA” model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).”

      (7) "Overall, the results from this study support the exact same conclusions (See SI section 1.2) but with one difference. In the discovery study, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3)" - this seems like a very salient difference, when the paper reports the feedback effect as a primary finding of interest, though I understand there remains a valence-based difference.

      We agree with the reviewer and thank them for this suggestion. We now state explicitly throughout the manuscript that this finding was obtained only in one of our two studies. In the section “Discovery study” of the results we state explicitly this finding was not found in the discovery study:

      “However, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).”

      We also note that related to another concern from R3 (that perseveration may masquerade as positivity bias) we conducted additional analyses (detailed in SI 3.6.2). These analyses revealed that the observed positivity bias for the 1-star agent in the discovery study falls within the range predicted by simple choice-perseveration. Consequently, we have removed the suggestion that participants still learn from the random agent in the discovery study. Furthermore, we have modified the discussion section to include a possible explanation for this discrepancy between the two studies:

      “Notably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individual’s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).”

      (8) "Participants were instructed that this feedback would be "a lie 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded." - I agree that this is a possible explanation for updating from the random source. It is a meaningful caveat.

      Thank you for this thought. While this can be seen as a caveat—since we don’t know what would have happened with explicit instructions—we also believe it is interesting from another perspective. In many real-life situations, individuals may have all the necessary information to infer that the feedback they receive is uninformative, yet still fail to do so, especially when they are not explicitly told to ignore it.

      In future work, we plan to examine how behaviour changes when participants are given more explicit instructions—for example, that the 50%-credibility agent provides purely random feedback.

      (9) "Future studies should investigate conditions that enhance an ability to discard disinformation, such as providing explicit instructions to ignore misleading feedback, manipulations that increase the time available for evaluating information, or interventions that strengthen source memory." - there is work on some of this in the misinformation literature that should be cited, such as the "continued influence effect". For example: Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of experimental psychology: Learning, memory, and cognition, 20(6), 1420.

      We thank the reviewer for pointing us towards the relevant literature. We have now included citations about the “continued influence effect” of misinformation in the discussion:

      “In our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the “continued-influence effect” whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60).”

      (10) Are the authors arguing that choice-confirmation bias may be at play? Work on choice-confirmation bias generally includes counterfactual feedback, which is not present here.

      We agree with the reviewer that a definitive test for choice-confirmation bias typically requires counterfactual feedback, which is not present in our current task. In our discussion, we indeed suggest that the positivity bias we observe may stem from a form of choice-confirmation, drawing on the extensive literature on this bias in reinforcement learning (Lefebvre et al., 2017; Palminteri et al., 2017; Palminteri & Lebreton, 2022). However, we fully acknowledge that this link is a hypothesis and that explicitly testing for choice-confirmation bias would necessitate a future study specifically incorporating counterfactual feedback. We have included a clarification of this point in the discussion:

      “Previous reinforcement learning studies, report greater credit-assignment based on positive compared to negative feedback, albeit only in the context of veridical feedback (43,44,62). Here, supporting our a-priori hypothesis we show that this positivity bias is amplified for information of low and intermediate credibility (in absolute terms in the discovery study, and relative to the overall extent of CA in both studies) . Of note, previous literature has interpreted enhanced learning for positive outcomes in reinforcement learning as indicative of a confirmation bias (42,44). For example, positive feedback may confirm, to a greater extent than negative feedback one’s choice as superior (e.g., “I chose the better of the two options”). Leveraging the framework of motivated cognition (35), we posited that feedback of uncertain veracity (e.g., low credibility) amplifies this bias by incentivising individuals to self-servingly accept positive feedback as true (because it confers positive, desirable outcomes), and explain away undesirable, choice-disconfirming, negative feedback as false. This could imply an amplified confirmation bias on social media, where content from sources of uncertain credibility, such as unknown or unverified users, is more easily interpreted in a self-serving manner, disproportionately reinforcing existing beliefs (63). In turn, this could contribute to an exacerbation of the negative social outcomes previously linked to confirmation bias such as polarization (64,65), the formation of ‘echo chambers’ (19), and the persistence of misbelief regarding contemporary issues of importance such as vaccination (66,67) and climate change (68–71). We note however, that further studies are required to determine whether positivity bias in our task is indeed a form of confirmation bias.”

      Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      We deeply thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the right way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but not the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the nonnormalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      We thank the reviewer for raising this important point about the definition of positivity bias, and for their thoughtful discussion on the absolute versus relative measures. We believe that the relative valence bias offers a distinct and valuable perspective on positivity bias. Conceptually, this measure describes positivity bias in a manner akin to a “percentage difference” relative to the overall level of learning which allows us to control for the overall decreases in the overall amount of credit assignment as feedback becomes less credible. We are unsure if one measure is better or more correct than the other and we believe that reporting both measures enriches the understanding of positivity bias and allows for a more comprehensive characterization of this phenomenon (as long as these measures are interpreted carefully). We have stated the significance of the relative measure in the results section:

      “Following previous research, we quantified positivity bias in 2 ways: 1) as the absolute difference between credit-assignment based on positive or negative feedback, and 2) as the same difference but relative to the overall extent of learning. We note that the second, relative, definition, is more akin to “percentage change” measurements providing a control for the overall lower levels of credit-assignment for less credible agent.”

      We also wish to point out that in our discovery study we had some evidence for amplification of positivity bias in absolute sense.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      We thank the reviewer for this very insightful and crucial point regarding the potential confound between positivity bias and perseveration. We entirely agree that distinguishing these effects can be challenging. To rigorously address this concern and ascertain that our observed positivity bias, particularly its inflation for low-credibility feedback, is not merely an artifact of perseveration, we conducted additional analyses as suggested.

      First, following the reviewer’s suggestion we simulated our Bayesian models, including a perseveration term, for both our main and discovery studies. Crucially, none of these simulations predicted the specific pattern of inflated positivity bias for low-credibility feedback that we identified in participants.

      Additionally, taking a “devil’s advocate” approach, we tested whether our credibility-CA model (which includes perseveration but not a feedback valence bias) can predict our positivity bias findings. Thus, we simulated 100 datasets using our Credibility-CA model (based on empirical best-fitting parameters). We then fitted each of these simulated datasets using our CredibilityValence CA model. By examining the distribution of results across these synthetic datasets fits and comparing them to the actual results from participants, we found that while perseveration could indeed lead (as the reviewer suspected) to an artifactual positivity bias, it could not predict the magnitude of the observed inflation of positivity bias for low-credibility feedback (whether measured in absolute or relative terms).

      Based on these comprehensive analyses, we are confident that our main results concerning the modulation of a valence bias as a function of source-credibility cannot be accounted by simple choice-perseveration. We have briefly explained these analyses in the main results section:

      “Previous research has suggested that positivity bias may spuriously arise from pure choice-perseveration (i.e., a tendency to repeat previous choices regardless of outcome) (49,50). While our models included a perseveration-component, this control may not be preferent. Therefore, in additional control analyses, we generated synthetic datasets using models including choice-perseveration but devoid of feedback-valence bias, and fitted them with our credibility-valence model (see SI 3.6.1). These analyses confirmed that perseveration can masquerade as an apparent positivity bias. Critically, however, these analyses also confirmed that perseveration cannot account for our main finding of increased positivity bias, relative to the overall extent of CA, for low-credibility feedback.”

      Additionally, we have added a detailed description of these additional analyses and their findings to the Supplementary Information document:

      “3.6 Positivity bias results cannot be explained by a pure perseveration

      3.6.1 Main study

      Previous research has suggested it may be challenging to dissociate between a feedback-valence positivity bias and perseveration (i.e., a tendency to repeat previous choices regardless of outcome). While our Credit Assignment (CA) models already include a perseveration mechanism to account for this, this control may not be perfect. We thus conducted several tests to examine if our positivity-bias related results could be accounted for by perseveration.

      First we examined whether our Bayesian-models, augmented by a perseveration mechanism (as in our CA model) can generate predictions similar to our empirical results. We repeated our cross-fitting procedure to these extended Bayesian models. To briefly recap, this involved fitting participant behavior with them, generating synthetic datasets based on the resulting maximum likelihood (ML) parameters, and then fitting these simulated datasets with our Credibility-Valence CA model (which is designed to detect positivity bias). This test revealed that adding perseveration to our Bayesian models did not predict a positivity bias in learning. In absolute terms there was a small negativity bias (instructed-credibility Bayesian: b=−0.19, F(1,1218)=17.78, p<0.001, Fig. S23a-b; free-credibility Bayesian: b=−0.17, F(1,1218)=13.74, p<0.001, Fig. S23d-e). In relative terms we detected no valence related bias (instructed-credibility Bayesian: b=−0.034, F(1,609)=0.45, p=0.50, Fig. S22c; free-credibility Bayesian: b=−0.04, F(1,609)=0.51, p=0.47, Fig. S23f). More critically, these simulations also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (instructed-credibility Bayesian: F(2,1218)=0.024, p=0.98, Fig. S23b; free-credibility Bayesian: F(2,1218)=0.008, p=0.99, Fig. S23e), nor at a relative level (instructedcredibility Bayesian: F(2,609)=1.57, p=0.21, Fig. S23c; free-credibility Bayesian: F(2,609)=0.13, p=0.88, Fig. S23f). The upshot is that our positivity-bias findings cannot be accounted for by our Bayesian models even when these are augmented with perseveration.

      However, it is still possible that empirical CA parameters from our credibility-valence model (reported in main text Fig. 5) were distorted, absorbing variance from a perseveration. To address this, we took a “devil's advocate” approach testing the assumption that CA parameters are not truly affected by feedback valance and that there is only perseveration in our data. Towards that goal, we simulated data using our CredibilityCA model (which includes perseveration but does not contain a valence bias in its learning mechanism) and then fitted these synthetic datasets using our Credibility-Valence CA model to see if the observed positivity bias could be explained by perseveration alone. Specifically, we generated 101 “group-level” synthetic datasets (each including one simulation for each participant, based on their empirical ML parameters), and fitted each dataset with our Credibility-Valence CA model. We then analysed the resulting ML parameters in each dataset using the same mixed-effects models as described in the main text, examining the distribution of effects of interest across these simulated datasets. Comparing these simulation results to the data from participants revealed a nuanced picture. While the positivity bias observed in participants is within the range predicted by a pure perseveration account when measured in absolute terms (Fig. S24a), it is much higher than predicted by pure perseveration when measured relative to the overall level of learning (Fig. S24c). More importantly, the inflation in positivity bias for lower credibility feedback is substantially higher in participants than what would be predicted by a pure perseveration account, a finding that holds true for both absolute (Fig. S24b) and relative (Fig. S24d) measures.”

      “3.6.2 Discovery study

      We then replicated these analyses in our discovery study to confirm our findings. We again checked whether extended versions of the Bayesian models (including perseveration) predicted the positivity bias results observed. Our cross-fitting procedure showed that the instructed-credibility Bayesian model with perseveration did predict a positivity bias for all credibility levels in this discovery study, both when measured in absolute terms [50% credibility (b=1.74,t(824)=6.15), 70% credibility (b=2.00,F(1,824)=49.98), 85% credibility (b=1.81,F(1,824)=40.78), 100% credibility (b=2.42,F(1,824)=72.50), all p's<0.001], and in relative terms [50% credibility (b=0.25,t(412)=3.44), 70% credibility (b=0.31,F(1,412)=17.72), 85% credibility (b=0.34,F(1,412)=21.06), 100% credibility (b=0.42,F(1,412)=31.24), all p's<0.001]. However, importantly, these simulations did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,412)=1.43,p=0.24), nor at a relative level (F(3,412)=2.06,p=0.13) (Fig. S25a-c). In contrast, simulations of the free-credibility Bayesian model (with perseveration) predicted a slight negativity bias when measured in absolute terms (b=−0.35,F(1,824)=5.14,p=0.024), and no valence bias when measured relative to the overall degree of learning (b=0.05,F(1,412)=0.55,p=0.46). Crucially, this model also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,824)=0.27,p=0.77), nor at a relative level (F(3,412)=0.76,p=0.47) (Fig. S25d-f).

      As in our main study, we next assessed whether our Credibility-CA model (which includes perseveration but no valence bias) predicted the positivity bias results observed in participants in the discovery study. This analysis revealed that the average positivity bias in participants is higher than predicted by a pure perseveration account, both when measured in absolute terms (Fig. S26a) and in relative terms (Fig. S26c). Specifically, only the aVBI for the 70% credibility agent was above what a perseveration account would predict, while the rVBI for all agents except the completely credible one exceeded that threshold. Furthermore, the inflation in positivity bias for lower credibility feedback (compared to the 100% credibility agent) is significantly higher in participants than would be predicted by a pure perseveration account, in both absolute (Fig. S26b) and relative (Fig. S26d) terms.

      Together, these results show that the general positivity bias observed in participants could be predicted by an instructed-credibility Bayesian model with perseveration, or by a CA model with perseveration. Moreover, we find that these two models can predict a positivity bias for the 50% credibility agent, raising a concern that our positivity bias findings for this source may be an artefact of not-fully controlled for perseveration. However, the credibility modulation of this positivity bias, where the bias is amplified for lower credibility feedback, is consistently not predicted by perseveration alone, regardless of whether perseveration is incorporated into a Bayesian or a CA model. This finding suggests that participants are genuinely modulating their learning based on feedback credibility, and that this modulation is not merely an artifact of choice perseveration.”

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      Before addressing these excellent comments, we first note that we have now improved our "TruthCA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our former model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome). Please note in our responses below that we conducted extensive analysis to confirm that positivity bias doesn’t in fact predict the truthbias we detect using our truth biased model

      We have described this new model in the methods section:

      “Additionally, we formulated a “Truth-CA” model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see “Methods: Bayesian estimation of posterior belief that feedback is true”).”

      All relevant results have been updated accordingly in the main text:

      To formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the “Truth-CA” model) to the data. This variant works as our Credibility-CA model, but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where 𝑇𝐵 is the free parameter representing the truth bonus, and 𝑃(𝑡𝑟𝑢𝑡ℎ) is the probability the received feedback being true (from the experimenter’s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenter’s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this ‘oracle model’ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood.

      Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.”

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      “We next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the “Truth-CA” model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).”

      Additionally, we thank the reviewer for pointing us to the relevant work by Pilgrim et al. (2024). We agree that the relationship between "true feedback" and "positivity bias" effects is nuanced, and their potential overlap warrants careful consideration. Note our analyses suggest that this is not solely the case. Firstly, simulations of our Credibility-Valence CA model predict only a small "truth bonus" effect, which is notably smaller than what we observed in participants. Secondly, we formulated an extension of our "Truth-CA" model that includes a valence bias in credit assignment. If our truth bonus results were merely an artifact of positivity bias, this extended model should absorb that variance, producing a null truth bonus parameter. However, fitting this model to participant data still revealed a significant positive truth bonus, which again exceeds the range predicted by simulations of our Credibility CA model:

      “3.7 Truth inference is still detected when controlling for valence bias

      Given that participants frequently select bandits that are, on average, mostly rewarding, it is reasonable to assume that positive feedback is more likely to be objectively true than negative feedback. This raises a question if the "truth inference" effect we observed in participants might simply be an alternative description of a positivity bias in learning. To directly test this idea, we extended our Truth-CA model to explicitly account for a valence bias in credit assignment. This extended model features separate CA parameters for positive and negative feedback for each agent. When we fitted this new model to participant behavior, it still revealed a significant truth bonus in both the main study (Wilkoxon’s signrank test: median = 0.09, z(202)=2.12, p=0.034; Fig. S27a) and the discovery study (median = 3.52, z(102)=7.86, p<0.001; Fig. S27c). Moreover, in the main study, this truth bonus remained significantly higher than what was predicted by all the alternative models, with the exception of the instructed-credibility bayesian model (Fig. S27b). In the discovery study, the truth bonus was significantly higher than what was predicted by all the alternative models (Fig. S27d).”

      Together, these findings suggest that our truth inference results are not simply a re-description of a positivity bias.

      Conversely, we acknowledge the reviewer's point that our positivity bias results could potentially stem from a more general truth inference mechanism. We believe that this possibility should be addressed in a future study where participants rate their belief that received feedback is true (rather than a lie).We have extended our discussion to clarify this possibility and to include the suggested citation:

      “Our findings show that individuals increase their credit assignment for feedback in proportion to the perceived probability that the feedback is true, even after controlling for source credibility and feedback valence. Strikingly, this learning bias was not predicted by any of our Bayesian or credit-assignment (CA) models. Notably, our evidence for this bias is based on a “oracle model” that incorporates the probability of feedback truthfulness from the experimenter's perspective, rather than the participant’s. This raises an important open question: how do individuals form beliefs about feedback truthfulness, and how do these beliefs influence credit assignment? Future research should address this by eliciting trial-by-trial beliefs about feedback truthfulness. Doing so would also allow for testing the intriguing possibility that an exaggerated positivity bias for non-credible sources reflects, to some extent, a truth-based discounting of negative feedback—i.e., participants may judge such feedback as less likely to be true. However, it is important to note that the positivity bias observed for fully credible sources (here and in other literature) cannot be attributed to a truth bias—unless participants were, against instructions, distrustful of that source.”

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      “Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.”

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      We apologize for the lack of clarity in our previous explanation. We removed the sentence you cited (it was intended to make a different point which we now consider non-essential). Our current narration is consistent with the point you are making.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to disinformation, rather than simply less information. I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      We thank the reviewer for highlighting the parallel (and difference) between feedback reliability and reward stochasticity. However, we have not found any comparable results in the literature. We also note that our discussion includes a paragraph addressing the locus of our effects making the point that more studies are necessary to determine whether our findings are due to disinformation per se or sources being less informative. While this paragraph was included in the previous version it led us to infer our Discussion was too long and we therefore shortened it considerably:

      “An important question arises as to the psychological locus of the biases we uncovered. Because we were interested in how individuals process disinformation—deliberately false or misleading information intended to deceive or manipulate—we framed the feedback agents in our study as deceptive, who would occasionally “lie” about the true choice outcome. However, statistically (though not necessarily psychologically), these agents are equivalent to agents who mix truth-telling with random “guessing” or “noise” where inaccuracies may arise from factors such as occasionally lacking access to true outcomes, simple laziness, or mistakes, rather than an intent to deceive. This raises the question of whether the biases we observed are driven by the perception of potential disinformation as deceitful per se or simply as deviating from the truth. Future studies could address this question by directly comparing learning from statistically equivalent sources framed as either lying or noisy. Unlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72–75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on others’ actions or advice—whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72). Secondly, participants in our study lack private experiences unmediated by feedback sources. Finally, unlike most observational learning paradigms, we systematically address scenarios with deliberately misleading social partners. Future studies could bridge this by incorporating deceptive social partners into observational learning, offering a chance to develop unified models of how individuals integrate social information when credibility is paramount for decision-making.”

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      We thank the reviewer for this important suggestion which we address together with the following point.

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore not Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

      We thank the reviewer for their insightful and illuminating comments, particularly concerning the interpretation of our model parameters and the nature of our Credit assignment model. We believe your interpretation of the model is accurate and we now narrate it to readers in the hope that our modelling will become clearer and more intuitively. We also present to readers how these recasts our “cross-fitting” approach in the way you suggested (we return to this point below).

      Broadly, while we agree that modelling results depend on underlying assumptions, we believe that “model-agnostic” approaches also have important limitations—especially in reinforcement learning (RL), where choices are shaped by histories of past events, which such approaches often fail to fully account for. As students of RL, we are frequently struck by how careful modelling demonstrates that seemingly meaningful “model-agnostic” patterns can emerge as artefacts of unaccounted-for variables. We also note that the term “model-agnostic” is difficult to define—after all, even regression models rely on assumptions, and some computational models make richer or more transparent assumptions than others. Ideally, we aim to support our findings using converging methods wherever possible.

      We want to clarify that many of our reported findings indeed stem from straightforward behavioral analyses (e.g., simple regressions of choice-repetition), which do not rely on complex modeling assumptions. The two key results that primarily depend on the analysis of model parameters are our findings related to positivity bias and truth inference.

      Regarding the positivity bias, identifying truly model-agnostic behavioral signatures, distinct from effects like choice-perseveration, has historically been a significant challenge in the literature. Classical research on this bias rests on the interpretation of model parameters (Lefebvre et al., 2017; Palminteri et al., 2017), or at least on the use of models to assess what an “unbiased learner” baseline should look like (Palminteri & Lebreton, 2022). Some researchers have suggested possible regressions incorporating history effects to detect positivity bias from choicerepetition behavior, but these regressions (as our model) rely on subtle assumptions about forgetting and history effects (Toyama et al., 2019). Specifically, in our case, this issue is also demonstrated by analysis we conducted related to the previous point the reviewer made (about perseveration masquerading as positivity bias). We believe that dissociating clearly positivity bias from perseveration is an important challenge for the field going forward.

      For our truth inference results, obtaining purely behavioral signatures is similarly challenging due to the intricate interdependencies (the reviewer has identified in previous points) between agent credibility, feedback valence, feedback truthfulness, and choice accuracy within our task design.

      Finally, we agree with the reviewer that regression coefficients are often interpreted as a “modelagnostic” pattern. From this perspective even our findings regarding positivity and truth bias are not a case of over-reliance on complex model assumptions but are rather a way to expose deviations between empirical “sophisticated” regression coefficients and coefficients predicted from Bayesian models.

      We have now described the main learning rule of our model in the main text to ensure that the meaning of the CA parameters is clearer for readers:

      “Next, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, “Q value“, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free “Credit-Assignment (CA)” model parameters (47):

      𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) + 𝐶𝐴(𝑎𝑔𝑒𝑛𝑡, 𝑣𝑎𝑙𝑒𝑛𝑐𝑒) ∗ 𝐹

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (∈[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods). The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a “sophisticated” logistic regression, where the CA parameters take the role of “regression coefficients” corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback; see “Methods: RL models”) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.”

      We also explain the implications of this perspective for our cross-fitting procedure:

      “To further characterise deviations between behaviour and our Bayesian learning models, we used a “crossfitting” method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participants’ best fitting parameters), but fitted these data using the CA-models, obtaining what we term “Bayesian-CA parameters” (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters). Using this approach, we found that both the instructed-credibility and free-credibility Bayesian models predicted increased BayesianCA parameters as a function of agent credibility (Fig. 3c; see SI 3.1.1.2 Tables S8 and S9). However, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Keep terms consistent, e.g., follow-up vs. main; hallmark vs. traditional.

      We have now changed the text to keep terms consistent.

      (2) CA model is like a learning rate; but it's based on the raw reward, not the TD error - this seems strange.

      We thank the reviewer for this comment. We understand that the use of a CA model instead of a TD error model may seem unusual at first glance. However, the CA model offers an important advantage: it more easily accommodates what we term "negative learning rates". This means that some participants may treat certain agents (especially the random one) as consistently deceitful, leading them to effectively increase/reduce choice tendencies following negative/positive feedback. A CA model handles this naturally by allowing negative CA parameters as a simple extension of positive ones. In contrast, adapting a TD error model to account for this is more complex. For instance, attempting to introduce a "negative learning rate" makes the RW model behave in a non-stable manner (e.g., Q values become <0 or >1). At the initial stages of our project, we explored different approaches to dealing with this issue and we found the CA model provides the best approach. For these reasons, we decided to proceed with our CA model.

      Additionally, we used the CA model in previous studies (e.g., Moran, Dayan & Dolan (2021)) where we included (in SI) a detailed discussion of the similarities and difference between creditassignment and Rescorla-Wagner models

      (3) Why was the follow-up study not pre-registered?

      We appreciate the reviewer's comment regarding preregistration, which we should have done. Unfortunately, this is now “water under the bridge” but going forward we hope to pre-register increasing parts of our work.

      (4) Other work looking at reward stochasticity?

      As noted in point 4 of the main weaknesses, previous work on reward stochasticity primarily focused on explaining the increase/decrease in learning and its mechanistic bases under varying stochasticity levels. In our study, we uniquely characterize several specific learning biases that are modulated by source credibility, a topic not extensively explored within the existing reward stochasticity framework, as far as we know.

      (5) Equation 1 is different from the one in the figure?

      The reviewer is completely correct. The figure provides a simplified visual representation, primarily focusing on the feedback-based update of the Q-value, and for simplicity, it omits the forgetting term present in the full Equation 1. To ensure complete clarity and prevent any misunderstanding, we have now incorporated a more detailed explanation of the model, including the complete Equation 1 and its components, directly within the main text. This comprehensive description will ensure that readers are fully aware of how the model operates.

      “Next, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, “Q value“, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free “Credit-Assignment (CA)” model parameters (47):

      𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) + 𝐶𝐴(𝑎𝑔𝑒𝑛𝑡, 𝑣𝑎𝑙𝑒𝑛𝑐𝑒) ∗ 𝐹

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (∈[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods).”

      (6) Please describe/plot the distribution of all fitted parameters in the supplement. I would include the mean and SD in the main text (methods) as well.

      Following the reviewer’s suggestions, we have included in the Supplementary Document tables displaying the mean and SD of fitted parameters from participants for our main models of interest. We have also plotted the distributions of such parameters. Both for the main study:

      (7) "A novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework".

      The idea of applying RL to disinformation is not new. Please tone down novelty claims. It would be nice to cite/discuss some of this work as well.

      https://arxiv.org/abs/2106.05402?utm_source=chatgpt.com https://www.scirp.org/pdf/jbbs_2022110415273931.pdf https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4173312

      We thank the reviewer for pointing us towards relevant literature. We have now toned down the sentence in the introduction and cited the references provided:

      “To address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37–40), our approach is novel in using one of its most popular tasks: the “bandit task”.”

      (8) Figure 3a - The figures should be in the order that they're referenced (3 is referenced before 2).

      We generally try to stick to this important rule but, in this case, we believe that our ordering serves better the narrative and hope the reviewer will excuse this small violation.

      (9) "Additionally, we found a positive feedback-effect for the 3-star agent"

      What is the analysis here? To avoid confusion with the "positive feedback" effect, consider using "positive effect of feedback". The dash wasn't sufficient to avoid confusion in my case.

      We have now updated the terms in the text to avoid confusion.

      (10) The discovery study revealed even stronger results supporting a conclusion that the credibility-CA model was superior to both Bayesian models for most subjects

      This is very subjective, but I'll just mention that my "cherry-picking" flag was raised by this sentence. Are you only mentioning cases where the discovery study was consistent with the main study? Upon a closer read, I think the answer is most likely "no", but you might consider adopting a more systematic (perhaps even explicit) policy on when and how you reference the discovery study to avoid creating this impression in a more casual reader.

      We thank the reviewer for this valuable suggestion. To prevent any impression of "cherry-picking", we have removed specific references to the discovery study from the main body of the text. Instead, all discussions regarding the convergence and divergence of results between the two studies are now in the dedicated section focusing on the discovery study:

      “The discovery study (n=104) used a disinformation task structurally similar to that used in our main study, but with three notable differences: 1) it included 4 feedback agents, with credibilities of 50%, 70%, 85% and 100%, represented by 1, 2, 3, and 4 stars, respectively; 2) each experimental block consisted of a single bandit pair, presented over 16 trials (with 4 trials for each feedback agent); and 3) in certain blocks, unbeknownst to participants, the two bandits within a pair were equally rewarding (see SI section 1.1). Overall, this study's results supported similar conclusions as our main study (see SI section 1.2) with a few differences. We found convergent support for increased learning from more credible sources (SI 1.2.1), superior fit for the CA model over Bayesian models (SI 1.2.2) and increased learning from feedback inferred to be true (SI 1.2.6). Additionally, we found an inflation of positivity bias for low-credibility both when measured relative to the overall level of credit assignment (as in our main study), or in absolute terms (unlike in our main study) (Fig. S3; SI 1.2.5). Moreover, choice-perseveration could not predict an amplification of positivity bias for low-credibility sources (see SI 3.6.2). However, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).”

      (11) An in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from normative Bayesian learning.

      Consider saying where this in-depth comparison can be found (based on my reading, I think you're referring to the next section?

      We have now modified the sentence for better clarity:

      “However, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.”

      (12) "which essentially provides feedback" Perhaps you meant "random feedback"?

      We have modified the text as suggested by the reviewer.

      <(13) Essentially random

      Why "essentially"? Isn't it just literally random?

      We have modified the text as suggested by the reviewer.

      (14) Both Bayesian models predicted an attenuated credit-assignment for the 3-star agent

      Attenuated relative to what? I wouldn't use this word if you mean weaker than what we see in the human data. Instead, I would say people show an exaggerated credit-assignment, since Bayes is the normative baseline.

      We changed the text according to the reviewer’s suggestion:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models.”

      (15) "there was no difference between 2-star and 3-star agent contexts (b=0.051, F(1,2419)=0.39, p=0.53)"

      You cannot confirm the null hypothesis! Instead, you can write "The difference between 2-star and 3-star agent contexts was not significant". Although even with this language, you should be careful that your conclusions don't rest on the lack of a difference (the next sentence is somewhat ambiguous on this point).

      Additionally, the reported b coefs do not match the figure, which if anything, suggests a larger drop from 0.75 (2-star) to 1 (3-star). Is this a mixed vs fixed effects thing? It would be helpful to provide an explanation here.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a DIFFERENT context would constitute a striking demonstration of our “contrast effect”. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the same-context condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all p’s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a “low credibility context” (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participants’ learning when it follows non-credible feedback, in the same learning context.”

      We have modified the discussion accordingly as well:

      “A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a “contrast effect”, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of ‘fake news’. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts, which may boost their impact.”

      And we have included some additional analyses in the SI document:

      “3.5 Contrast effects for contexts featuring a different bandit Given that we observed a contrast effect when both the learning and the immediately preceding "context trial” involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair – a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).”

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      𝑅𝐸𝑃𝐸𝐴𝑇 ~ 𝐶𝑂𝑁𝑇𝐸𝑋𝑇_𝑇𝑌𝑃𝐸 ∗ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.”

      (16) "Strikingly, model-simulations (Methods) showed this pattern is not predicted by any of our other models"

      Why doesn't the Bayesian model predict this?

      Thanks for the comment. Overall, Bayesian models do predict a slight truth inference effect (see Figure 6d). However, these effects are not as strong as the ones observed in participants, suggesting that our results go beyond what would be predicted by a Bayesian model.

      Conceptually, it's important to note that the Bayesian model can infer (after controlling for source credibility and feedback valence) whether feedback is truthful based solely on prior beliefs about the chosen bandit. Using this inferred truth to amplify the weight of truthful feedback would effectively amount to “bootstrapping on one’s own beliefs.” This is most clearly illustrated with the 50% agent: if one believes that a chosen bandit yields rewards 70% of the time, then positive feedback is more likely to be truthful than negative feedback. However, a Bayesian observer would also recognize that, given the agent’s overall unreliability, such feedback should be ignored regardless.

      (17) "A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by a Bayesian strategy)".

      "Since we did not find any significant interactions between BETTER and the other regressors, we decided to omit it from the model formulation".

      Was this decision made after seeing the data? If so, please report the original analysis as well.

      We have included the BETTER regressor again, and we have re-run the analyses. We now report the results of such regression. We have also changed the methods section accordingly:

      “We used a different mixed-effects binomial regression model to test whether value learning from the 3-star agent was modulated by contextual credibility. We focused this analysis on instances where the previous trial with the same bandit pair featured the 3-star agent. We regressed the variable REPEAT, which indicated whether the current trial repeated the choice from the previous trial featuring the same bandit-pair (repeated choice=1, non-repeated choice=0). We included the following regressors: FEEDBACK coding the valence of feedback in the previous trial with the same bandit pair (positive=0.5, negative=-0.5), CONTEXT2-star indicating whether the trial immediately preceding the previous trial with the same bandit pair (context trial) featured the 2-star agent (feedback from 2-star agent=1, otherwise=0), and CONTEXT3star indicating whether the trial immediately preceding the previous trial with the same bandit pair featured the 3-star agent. We also included a regressor (BETTER) coding whether the bandit chosen in the learning trial was the better -mostly rewarding- or the worse -mostly unrewarding- bandit within the pair. We included in this analysis only current trials where the context trial featured a different bandit pair. The model in Wilkinson’s notation was:

      𝑅𝐸𝑃𝐸𝐴𝑇~ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡) ( 13 )

      In figure 4c, we independently calculate the repeat probability difference for the better (mostly rewarding) and worse (mostly non-rewarding) bandits and averaged across them. This calculation was done at the participants level, and finally averaged across participants.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations for the authors):

      Comments on revised version:

      The authors addressed the concerns adequately. The three remaining concerns are:

      (1) The use of one-way ANOVA is not well justified.

      The statement about statistical test in “Statistical analysis” section is as follows in the revised manuscript, “Data sets were tested for normality and direct comparisons between two groups were made using two-tailed Student’s t test (t test, for normally distributed data) as indicated. To evaluate statistical significance of three or more groups of samples, one-way ANOVA analysis with a Tukey test was used or repeated measures ANOVA analysis with a Tukey test was used in behavior assays. Statistical parameters are reported in the figures and the corresponding legends”.

      We used a one-way ANOVA for the data about one categorical independent variable and one quantitative dependent variable. The independent variable should have at least three different groups or categories. And we conducted repeated measures ANOVA analysis for the data about behavioral tests according to the suggestion by Reviewer #1 (Point 18) in revised manuscript.

      (2) The use of superplots to show culture to culture variability would make it more transparent.

      Thanks for the nice suggestion. While superplots could more transparently show culture to culture variability, it is difficult to add more colors or even shades to the scatterplots in the current form, which have already been color coded for multiple groups of samples. The scatterplots we used effectively illustrate the variability across all collected data and do not affect the conclusions of our study. Therefore, we prefer not to change the way of data presentation in the revised manuscript.

      (3) Change EEN1 in Figure 8B to EndoA1.

      Thanks a lot for the sharp eye. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Specific comments:

      The authors have made a substantial effort to improve their manuscript. A number of issues, related to numbers of observations mentioned by the reviewers, are clarified in the revised manuscript. The authors have also clarified some of the other questions from the reviewers. The long list of issues brought up by the reviewers and the many corrections needed still raise questions about data quality in this manuscript.

      In response to my comments (Point 2), the added experiment with PSD95.FingR and GPN.FingR in cultured neurons (Fig. S5A-D) is a good addition; the in vivo data using FingRs in Figure S3 look less convincing however. In response to my Point 5, the authors have added a cell-free binding assay (Figure 5I). This is a useful addition, but to convincingly make the point of interaction between Gephyrin and EndoA1, more rigorous biophysical quantitation of binding is needed. The legend in Figure 5I states that 4 independent experiments were performed, but the graph only shows 3 dots. This needs to be corrected.

      We sincerely appreciate your comments and apologize for any concerns raised. As suggested (Point 2), we made many efforts to visualize endogenous postsynaptic proteins using recombinant probes. However, due to much lower expression of GPN.FingR compared with PSD95.FingR in P21 brain slices following viral infection (Figure S3), we were unable to obtain better imaging results. To strengthen our data and conclusions, we additionally performed experiments with PSD95.FingR and GPN.FingR in cultured neurons (Fig. S5A-D) in the revised manuscript.

      Regarding the biophysical quantification of gephyrin–endophilin A1 binding, we do not have the equipment for this type of experiment (surface plasmon resonance or isothermal titration calorimetry). Instead, we performed a pull-down assay as an alternative to confirm their interaction (Figure 5I). We also apologize for the error in the number of independent experiments stated in the figure legend and have corrected it in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript uses a diverse isolate collection of Streptococcus pneumoniae from hospital patients in the Netherlands to understand the population-level genetic basis of growth rate variation in this pathogen, which is a key determinant of S. pneumoniae within-host fitness. Previous efforts have studied this phenomenon in strain-specific comparisons, which can lack the statistical power and scope of population-level studies. The authors collected a rigorous set of in vitro growth data for each S. pneumoniae isolate and subsequently paired growth curve analysis with whole-genome analyses to identify how phylogenetics, serotype, and specific genetic loci influence in vitro growth. While there were noticeable correlations between capsular serotype and phylogeny with growth metrics, they did not identify specific loci associated with altered in vitro growth, suggesting that these phenotypes are controlled by the collective effect of the entire genetic background of a strain. This is an important finding that lays the foundation for additional, more highly-powered studies that capture more S. pneumoniae genetic diversity to identify these genetic contributions.

      Thank you for an excellent summary of our manuscript.

      Strengths: 

      (1) The authors were able to completely control the experimental and genetic analyses to ensure all isolates underwent the same analysis pipeline to enhance the rigor of their findings.

      (2) The isolate collection captures an appreciable amount of S. pneumoniae diversity and, importantly, enables disentangling the contributions of the capsule and phylogenetic background to growth rates.

      (3) This study provides a population-level, rather than strain-specific, view of how genetic background influences the growth rate in S. pneumoniae. This is an advance over previous studies that have only looked at smaller sets of strains.

      (4) The methods used are well-detailed and robust to allow replication and extension of these analyses. Moreover, the manuscript is very well written and includes a thoughtful and thorough discussion of the strengths and limitations of the current study.

      Thank you for excellently summarising the strengths of our manuscript.

      Weaknesses: 

      (1) As acknowledged by the authors, the genetic diversity and sample size of this newly collected isolate set are still limited relative to the known global diversity of S. pneumoniae, which evidently limits the power to detect loci with smaller/combinatorial contributions to growth rate (and ultimately infection). 

      Indeed, while larger pneumococcal datasets exist globally, most of these datasets do not have reliable metadata on in vitro growth rates and other phenotypes, as the intention, for the most part, is to conduct population-level surveillance to track the changes in the serotype distribution to assess the impact of introducing pneumococcal conjugate vaccines. In this study, we adopted a different approach to phenotypically characterising the samples collected from these surveillance studies to understand the genetic features that influence the intrinsic growth characteristics of the isolates. While our dataset size is modest, it exemplifies how we can combine whole-genome sequencing and phenotypic characterisation of bacterial isolates to understand the genetic determinants that may drive intrinsic phenotypic differences between strains.

      (2) The in vitro growth data is limited to a single type of rich growth medium, which may not fully reflect the nutritional and/or selective pressures present in the host.

      We agree that our study focused on a single type of rich growth medium, which may not fully reflect the nutritional or selective pressures present in the host. The rationale and the representativeness of the selected culture conditions were more extensively discussed in Arends et al. (10.1128/spectrum.00050-22). Considering that this was a proof-of-concept study to assess the feasibility of our approach, future studies by us and others will evaluate the impact of using different media. Besides the media, complementary techniques such as transcriptome sequencing will help uncover additional insights into potential factors that influence differences in pneumococcal growth kinetics. 

      (3) The current study does not use genetic manipulation or in vitro/in vivo infection models to experimentally test whether alteration of growth rates as observed in this study is linked to virulence or successful infection. The availability of a naturally diverse collection with phylogenetic and serotype combinations already identified as interesting by the authors provides a strong rationale for wet-lab studies of these phenotypes.

      We concur that additional genetic manipulation studies to assess the impact of altering growth rates on virulence and infection would have provided further insights. While this was beyond the scope of this study, we plan to conduct follow-up work to assess this using carefully selected strains from our pneumococcal collection. Because our current study demonstrates that genetic determinants of pneumococcal growth features are not simply confined to single loci, such experimental validation would require novel wet-lab approaches that consider epistatic interactions. In addition, in vivo infection models that allow the study of dissemination from the bloodstream are not yet well established.

      Reviewer #2 (Public review): 

      Summary: 

      The study by Chaguza et al. presents a novel perspective on pneumococcal growth kinetics, suggesting that the overall genetic background of Streptococcus pneumoniae, rather than specific loci, plays a more dominant role in determining growth dynamics. Through a genome-wide association study (GWAS) approach, the authors propose a shift in how we understand growth regulation, differing from earlier findings that pinpointed individual genes, such as wchA or cpsE, as key regulators of growth kinetics. This study highlights the importance of considering the cumulative impact of the entire genetic background rather than focusing solely on individual genetic loci.

      The study emphasizes the cumulative effects of genetic variants, each contributing small individual impacts, as the key drivers of pneumococcal growth. This polygenic model moves away from the traditional focus on single-gene influences. Through rigorous statistical analyses, the authors persuasively advocate for a more holistic approach to understanding bacterial growth regulation, highlighting the complex interplay of genetic factors across the entire genome. Their findings open new avenues for investigating the intricate mechanisms underlying bacterial growth and adaptation, providing fresh insights into bacterial pathogenesis.

      Thank you for an excellent summary of our manuscript.

      Strengths: 

      This study exemplifies a holistic approach to unraveling key factors in bacterial pathogenesis. By analyzing a large dataset of whole-genome sequences and employing robust statistical methodologies, the authors provide strong evidence to support their main findings. Which is a leap forward from previous studies focused on a relatively smaller number of strains. Their integration of genome-wide association studies (GWAS) highlights the cumulative, polygenic influences on pneumococcal growth kinetics, challenging the traditional focus on individual loci. This comprehensive strategy not only advances our understanding of bacterial growth regulation but also establishes a foundation for future research into the genetic underpinnings of bacterial pathogenesis and adaptation. The amount of data generated and corresponding approaches to analyze the data are impressive as well as convincing. The figures are convincing and comprehensible too.

      Thank you for pointing out the strengths of our manuscript excellently.

      Weaknesses: 

      Despite the strong outcomes of the GWAS approach, this study leaves room for differing interpretations. A key point of contention lies in the title, which initially gives the impression that the research addresses growth kinetics under both in vitro and in vivo conditions. However, the study is limited to in vitro growth kinetics, with the assumption that these findings are equally applicable to in vivo scenarios-a premise that is not universally valid. To more accurately reflect the study's scope and avoid potential misrepresentation, the title should explicitly specify "in vitro" growth kinetics. This clarification would better align the title with the study's actual focus and findings.

      Thank you for these suggestions. We have updated the title to include "in vitro" to avoid confusion. The new title now reads, “The capsule and genetic background, rather than specific loci, strongly influence in vitro pneumococcal growth kinetics.” While our study used in vitro data, our goal is to highlight that such in vitro differences in pneumococcal growth may influence in vivo dynamics, as highlighted in several papers referenced in the introduction and discussion. 

      This study suggests that the entire genetic background significantly influences bacterial growth kinetics. However, to transform these predictions into established facts, extensive experimental validation is necessary. This would involve "bench experiments" focusing on generating and studying mutant variants of serotypes or strains with diverse genomic variations, such as targeted deletions. The growth phenotypes of these mutants should be analyzed, complemented by complementation assays to confirm the specific roles of the deleted regions. These efforts would provide critical empirical evidence to support the findings from the GWAS approach and enhance understanding of the genetic basis of bacterial growth kinetics.

      We fully agree with this assessment. As reviewer #1 similarly highlighted, additional genetic manipulation studies would provide further helpful information to assess the impact of altering growth rates on virulence and infection. However, the experimental studies were beyond the scope of this study due to several factors beyond our control. However, we intend to conduct follow-up experimental work to provide additional insights into how the combination of serotypes and genetic background influences pneumococcal growth in vitro and virulence in vivo. Because our current study demonstrates that genetic determinants of pneumococcal growth features are not simply confined to single loci, such experimental validation would require novel wet-lab approaches that consider epistatic interactions. 

      In the discussion section, the authors state that "the influence of serotype appeared to be higher than the genetic background for the average growth rate" (lines 296-298). Alongside references 13-15, this emphasizes the important role of capsular variability, which is a key determinant of serotypes, in influencing growth kinetics. However, this raises the question: why isn't a specific locus like cps, which is central to capsule biogenesis, considered a strong influencer of growth kinetics in this study?

      Thank you for highlighting the point above. Indeed, the capsule biosynthesis (cps) locus is associated with pneumococcal growth kinetics, as seen in the analysis of individual serotypes. However, the cps locus does not come up as a hit in the GWAS because we controlled for the population structure of the pneumococcal strains. The absence of the hits in the cps locus is because serotypes, hence cps loci, tend to be tightly associated with lineages despite occasional capsule switches, which introduce serotypes to different lineages. Therefore, controlling for population structure, which is critical for GWAS analyses, virtually eliminates the detection of potential hits within the cps locus. However, detecting such hits with larger datasets may still be possible. For this reason, we performed a separate analysis of the individual serotypes and lineages shown in Figure 3.

      One plausible explanation could be the absence of "elevated signals" for cps in the GWAS analysis. GWAS relies on identifying loci with statistically significant associations to phenotypes. The lack of such signals for cps may indicate that its contribution, while biologically important, does not stand out genome-wide. This might be due to the polygenic nature of growth kinetics, where the overall genetic background exerts a cumulative effect, potentially diluting the apparent influence of individual loci like cps in statistical analyses. 

      We fully agree with this point. We mentioned in the abstract and discussion that the absence of the signals for specific individual loci within the pneumococcal genome may imply that the growth kinetics are polygenic. We have edited the discussion to emphasise the suggested point.

      Reviewer #3 (Public review): 

      This study provides insights into the growth kinetics of a diverse collection of Streptococcus pneumoniae, identifying capsule and lineage differences. It was not able to identify any specific loci from the genome-wide association studies (GWAS) that were associated with the growth features. It does provide a useful study linking phenotypic data with large-scale genomic population data. The methods for the large part were appropriately written in sufficient detail, and data analysis was performed with rigour. The interpretation of the results was supported by the data, although some additional explanation of the significance of e.g. ancestral state reconstruction would be useful. Efforts were made to make the underlying data fully accessible to the readers although some of the supplementary material could be formatted and explained a bit better. 

      Thank you for the excellent summary of the manuscript. We have added some text to clarify the significance of some approaches, including ancestral state reconstruction and supplementary material.

      Reviewer #1 (Recommendations for the authors): 

      (1) Since the PCBN was collected pre and post-vaccine introduction, did the authors stratify their analyses other than Figure 7 (disease correlations) to assess how vaccine status may influence growth rates? Is the assertion in Lines 238-239 supported by the in vitro data? 

      We have done this analysis. Overall, there was no association between vaccine introduction and pneumococcal growth rates. In lines 238-239, we assumed that in vaccinated populations, the host may be more capable of suppressing bacterial replication due to vaccination. However, there was no in vitro data to back this statement. Therefore, we have edited the statement to remove the text regarding vaccination policy. 

      We considered vaccination status when analysing the data presented in Figure 7. As mentioned in the legend, we only analysed the dataset collected before vaccine introduction to avoid confounding due to vaccination status. To fully assess the impact of vaccination, we would need additional information besides the date of isolation, including vaccine doses and time since vaccination, which was not available for our study.

      (2) Similarly, do any of the growth rate metrics correlate with other aspects of the clinical dataset, like the year of isolation or the sex/age of the patient?

      We did not include these assessments in the manuscript, as these aspects of the clinical dataset are mostly related to the patient and not necessarily the intrinsic characteristics of the pneumococcus. However, upon revising the manuscript, we compared the growth characteristics against the vaccination period, and we did not find any statistically significant association. The relationship between pneumococcal growth features of the isolates used in the current study and their corresponding clinical manifestations of invasive  disease was described in Arends  et al. (10.1128/spectrum.00050-22).

      (3) When evaluating the impact of serotype on growth rates, did the directionality of some of the described impacts match with those previously reported in other studies?

      We were unable to assess the directionality of the serotype’s impact on growth rates. In part, we did not conduct this analysis because our study used different strains from those used in other studies. Such differences in the genetic backgrounds, growth media, and analytical approaches made assessing the consistencies between the studies difficult.

      (4) Did the authors expect that a specific growth metric would be more likely to correlate with specific genetic variants? The reader would benefit from a brief discussion of how the metrics (e.g., maximum growth or lag phase duration) are biologically meaningful beyond the overall growth rate. 

      We indeed expected that specific growth metrics might correlate with certain genetic variants based on their distinct biological roles. The lag phase duration can potentially reflect the ability of the pneumococcus to adapt to environmental conditions, such as nutrient availability or stress, and may be more influenced by regulatory genes involved in sensing and responding to environmental cues (PMID: 30642990, PMID: 22139505). In contrast, maximum growth rate is more likely to be impacted by core metabolic or biosynthetic genes that control the rate of cell division under optimal conditions (PMID: 31053828). Maximum optical density, which reflects the final cell density, might be shaped by factors related to nutrient utilization efficiency, waste tolerance, or quorum sensing. The duration of the stationary phase is related to the switch from lipoteichoic acids to wall teichoic acids, permitting the initiation of the lytic growth phase (PMID: 239401). It is unclear whether this switch is mediated by external triggers or also by intrinsic features of the pneumococcus. Including this type of analysis allows for a more nuanced understanding of how genetic variants contribute to different physiological aspects of microbial growth. The relevance of the lag phase and the stationary phase in relation to the clinical phenotypes of invasive disease (such as pleural empyema and meningitis) of our pneumococcal isolates has been studied and discussed in Arends et al. (PMID: 35678554). The observed associations are summarized in Table 2 of that article. We have added some text in the discussion on the biological relevance of each bacterial growth metric.

      (5) For the GWAS analyses, have similar analyses been performed for other S. pneumoniae collections? Are there known "control" loci that the authors could replicate in the current collection to verify the robustness of the approach?

      Others have undertaken GWAS analyses of other S. pneumoniae collections elsewhere. Unlike our study, none of the GWAS analyses elsewhere focused on bacterial growth kinetics. Therefore, considering this is the first GWAS study in pneumococcus and bacteria, in general, to focus on growth kinetics, we do not have “control” loci that we could replicate to verify the robustness of the approach. However, we hope that future studies will be able to utilise our findings to compare their approach as more and more similar analyses of in vitro growth data become available.

      (6) Is there a statistical method that could predict the sample size necessary to detect the proposed combinatorial or small contributions from various genetic loci to growth rate? This reviewer is not an expert in statistical genetics but would appreciate an indication of the scale required by future studies to identify these regions.

      We are unaware of a statistical approach that could predict sample sizes to detect small or combinatorial effect sizes. However, we intend to conduct simulations in future studies to gain insights into the required sample sizes.

      (7) WGS and genome assembly metrics should be provided for each sequenced genome especially since only short-read assemblies were performed. If not already deposited, the assemblies should be deposited for data sharing as well.

      We have deposited the sequence reads to the European Nucleotide Archive (ENA) and provided the accession numbers, WGS, and assembly metrics in Supplementary Data 1. We have described the tools used to generate the assemblies from the reads.

      (8) Please include the specific ethics approval numbers for the sample collection protocol.

      Study procedures were approved by the Medical Ethical committees of the participating hospitals, including a waiver for individual informed consent (file number 2020–6644 Radboudumc).  

      Reviewer #3 (Recommendations for the authors): 

      Certain aspects of the manuscript could be clarified and extended to improve the manuscript.

      (1) Introduction 

      a) The authors assume knowledge by the reader on Streptococcus pneumoniae, specifically the genetic diversity of lineages and capsules. This diversity is highlighted in the discussion L368 that there are >100 serotypes. The authors should consider backgrounding the number of serotypes and the importance of serotype switching in these bacteria, as well as explaining the diversity of the lineages (GPSC) that are increasingly used as standard nomenclature for Streptococcus pneumonia.

      Thank you for bringing this to our attention. We have included a brief description of the GPSC lineages and capsule switching in the introduction.

      b) The last paragraph of the introduction is lengthy and gets into the methods and results of the manuscript. These could be edited down.

      We have revised the paragraph to remove the methods and results.

      (2) Methods 

      a) The authors should provide details on the QC undertaken and any exclusion criteria of genomes based on the QC. The supplement material has tabs e.g. read and assembly metrics but unclear how determined and impacted the study.

      We utilised all the genomes available for this study, which had in vitro phenotypic data available. We excluded no genomes due to poor sequence quality.

      Additional information about the genomes is available from previous studies, which are referenced in the methods section.

      b) Why did the authors map draft assemblies to the reference genome for the SNP alignment (from which the ML tree was inferred)? Draft genome assemblies usually contain errors so there is potential for false positive SNPs. Further, there is a lack of perbase quality information using the draft genome assemblies. Given the short read data are available - why were the reads not used as input for snippy (which is the standard input for snippy)? This may have impacted the results reliant on the SNP calls.

      We mapped a combination of reads and draft assemblies to the reference genome to generate the SNP alignment using Snippy (https://github.com/tseemann/snippy). For the pneumococcal isolates, we mapped the reads, while for the included outgroup, we mapped the assembly as we did not have sequence reads available. We have edited the methods section to clarify this.

      c) SNP alignment. the authors explain the decision to not undertake recombination detection later in the discussion. Did the authors mask any phage or repeat regions? And how was the outgroup S. oralis included in the analyses e.g what genome was used?

      We included the outgroup genome in the alignment generated by SNIPPY, which involved generating aligned consensus sequences for each isolate after mapping the reads to the pneumococcal ATCC 700669 reference genome (GenBank accession: NC_011900), as described in the methods. We have now included the accession number for the S. oralis genome, which was used as an outgroup in our phylogenetic analysis. Phages are not typically common in pneumococcal genomes compared to other species. Similarly, although repeats are present in the pneumococcal genome, the consensus in the field is that these do not particularly bias the pneumococcal phylogeny. Therefore, the consensus in the field has been not to explicitly mask these regions as done for highly clonal bacterial pathogens, such as Mycobacterium tuberculosis. Overall, our approach to building the phylogenetic tree is robust compared to alternative methods (PMID:

      29774245).

      d) Should the presence/absence of unitigs that were used as the input for the GWAS be included as a supp dataset?

      We have now provided the presence/absence matrix for the unitigs used in the  analysis as a supplementary dataset available at GitHub(https://github.com/ChrispinChaguza/SpnGrowthKinetics). We have revised the methods section to include a section on data availability.

      e) For the annotation of unitigs, the authors used their bespoke script with features from complete public genomes. Please provide accession/ identifying information of the complete genomes (not only the ATCC 700669) reference in the methods. Also, why did the authors choose not to annotate with annotate_hits_pyseer from pyseer? 

      We annotated the hits using our bespoke script because we understood our approach better and could control the information generated from the script. Annotating with “annotate_hits_pyseer” from pyseer would produce similar results to both approaches, as they compared the unitigs to annotated reference genomes.

      (3) Results 

      a) The authors could consider providing an overview of the diversity (e.g. lineages and capsules) in the study and contextualising it in the broader context of Streptococcus pneumoniae population genomics. This would help readers who are less familiar with this pathogen to understand the diversity included in this study. 

      We included this information in the first paragraph of the results section. Considering that population-level analyses based on this dataset have already been published, we have referenced the corresponding papers to provide additional information to readers.

      b) Did the timespan of the study pre and post-PCV7 introduction need to be briefly touched on in the results? For example, did the serotypes and lineages vary over the two collection periods and does this need to be considered in the interpretation of the results at all? 

      The prevalence of serotypes and lineages varied over time, partly due to the introduction of vaccines and random temporal fluctuations in the distribution of strains. We did not explicitly adjust for time, as this is not likely to influence the intrinsic biology of the strains. However, we adjusted for the population structure of the strains, whose changes would most likely affect the distribution of strains in the population. For other analyses, including that in Figure 7, we considered the vaccination status by restricting the analysis to the isolates collected before vaccine introduction.

      c) Figures. Some of the figures had very small text (especially Figure 1) that was difficult to read and Figure 2 and Figure 4 were mentioned once, while several paragraphs of results were used to discuss Figure 3. Is Figure 1 required as a main figure? Could Figure 3 be split? e.g. one with the chord diagram, one with panels b-e, and one with panels jq? Figure 4 - the ancestral state reconstruction analyses could be expanded upon in the results.

      We have increased the text in some figures where possible. However, for figures that show more information, smaller text is more suitable. 

      Figure 1 is essential to the manuscript as it provides a visual overview of the approach used in this study. Without this figure, it may be difficult for some readers, especially those unfamiliar with bacterial genomic analyses, to understand our study approach and how we estimated the pneumococcal growth parameters used for the GWAS. 

      For Figure 4, we prefer to keep it as it is, to have the information in one place, as splitting it will mean including some of the panels in the supplementary material, considering that we already have seven figures in the manuscript. 

      We have added additional text to the results regarding the ancestral reconstruction analyses. We included them mainly to demonstrate the correlation between the pneumococcal growth rates and the phylogeny.

      (4) Discussion 

      a) Why was 15 hours for culture undertaken and not 24? The authors discuss the impact that this may have had on their results.

      The 15-hour incubation period was deliberately chosen, as the growth curves indicate that most isolates had reached the stationary phase by that time. Extending the culture duration would likely not have yielded additional meaningful data. As is well established, Streptococcus pneumoniae undergoes autolysis upon reaching a certain cell density, which could distort growth measurements and complicate interpretation if incubation were prolonged. For clarification, we have changed the sentences related to this topic in the Discussion.

      b) Some paragraphs in the discussion were very long e.g. L347-381. The authors could consider breaking long paragraphs down into shorter ones to improve the readability of the manuscript.

      We agree with this assessment. We initially wanted to include all the information on the study’s limitations in the same paragraph. However, as suggested, we have now split the highlighted paragraph into two shorter paragraphs. 

      (5) Supplementary Data 

      a) Providing information in each tab of each supp data file would be useful. For example - including a table header that explained what was in each sheet rather than relying on the tab names. Formatting for some of the underlying supplementary data could be improved e.g. in supplementary data 2 no explanation is given to interpret the data included in these files.

      Thank you for the suggestions. For clarity, we have included a header in each tab of the spreadsheet that describes what is included in each dataset. We have also removed the previous Supplementary Data 2. We realised that the information presented in this spreadsheet was redundant, as it was already available in Supplementary Data 1.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This work by Govorunova et al. identified three naturally blue-shifted channelrhodopsins (ChRs) from ancyromonads, namely AnsACR, FtACR, and NlCCR. The phylogenetic analysis places the ancyromonad ChRs in a distinct branch, highlighting their unique evolutionary origin and potential for novel applications in optogenetics. Further characterization revealed the spectral sensitivity, ionic selectivity, and kinetics of the newly discovered AnsACR, FtACR, and NlCCR. This study also offers valuable insights into the molecular mechanism underlying the function of these ChRs, including the roles of specific residues in the retinal-binding pocket. Finally, this study validated the functionality of these ChRs in both mouse brain slices (for AnsACR and FtACR) and in vivo in Caenorhabditis elegans (for AnsACR), demonstrating the versatility of these tools across different experimental systems.

      In summary, this work provides a potentially valuable addition to the optogenetic toolkit by identifying and characterizing novel blue-shifted ChRs with unique properties.

      Strengths:

      This study provides a thorough characterization of the biophysical properties of the ChRs and demonstrates the versatility of these tools in different ex vivo and in vivo experimental systems. The mutagenesis experiments also revealed the roles of key residues in the photoactive site that can affect the spectral and kinetic properties of the channel.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      While the novel ChRs identified in this work are spectrally blue-shifted, there still seems to be some spectral overlap with other optogenetic tools. The authors should provide more evidence to support the claim that they can be used for multiplex optogenetics and help potential end-users assess if they can be used together with other commonly applied ChRs. Additionally, further engineering or combination with other tools may be required to achieve truly orthogonal control in multiplexed experiments.

      To demonstrate the usefulness of ancyromonad ChRs for multiplex optogenetics as a proof of principle, we co-expressed AnsACR with the red-shifted cation-conducting ChR Chrimson and measured net photocurrent generated by this combination as a function of the wavelength. We found that it is hyperpolarizing in the blue region of the spectrum, and depolarizing at the red region. In the revision, we added a new panel (Figure 1D) showing these results and the following paragraph to the main text:

      “To test the possibility of using AnsACR in multiplex optogenetics, we co-expressed it with the red-shifted CCR Chrimson (Klapoetke et al., 2014) fused to an EYFP tag in HEK293 cells. We measured the action spectrum of the net photocurrents with 4 mM Cl<sup>-</sup> in the pipette, matching the conditions in the neuronal cytoplasm (Doyon, Vinay et al. 2016). Figure 1D, black shows that the direction of photocurrents was hyperpolarizing upon illumination with λ<500 nm and depolarizing at longer wavelengths. A shoulder near 520 nm revealed a FRET contribution from EYFP (Govorunova, Sineshchekov et al. 2020), which was also observed upon expression of the Chrimson construct alone (Figure 1D, red)”.

      In the C. elegans experiments, partial recovery of pharyngeal pumping was observed after prolonged illumination, indicating potential adaptation. This suggests that the effectiveness of these ChRs may be limited by cellular adaptation mechanisms, which could be a drawback in long-term experiments. A thorough discussion of this challenge in the application of optogenetics tools would prove very valuable to the readership.

      We added the following paragraph to the revised Discussion:

      “One possible explanation of the partial recovery of pharyngeal pumping that we observed after 15-s illumination, even at the highest tested irradiance, is continued attenuation of photocurrent during prolonged illumination (desensitization). However, the rate of AnsACR desensitization (Figure 1 – figure supplement 4A and Figure 1 – figure supplement 5A) is much faster than the rate of the pumping recovery, reducing the likelihood that desensitization is driving this phenomenon. Another possible reason for the observed adaptation is an increase in the cytoplasmic Cl<sup>-</sup> concentration owing to AnsACR activity and hence a breakdown of the Cl<sup>-</sup> gradient on the neuronal membrane. The C. elegans pharynx is innervated by 20 neurons, 10 of which are cholinergic (Pereira, Kratsios et al. 2015). A pair of MC neurons is the most important for regulation of pharyngeal pumping, but other pharyngeal cholinergic neurons, including I1, M2, and M4, also play a role (Trojanowski, Padovan-Merhar et al. 2014). Moreover, the pharyngeal muscles generate autonomous contractions in the presence of acetylcholine tonically released from the pharyngeal neurons (Trojanowski, Raizen et al. 2016). Given this complexity, further elucidation of pharyngeal pumping adaptation mechanisms is beyond the scope of this study.”

      Reviewer #2 (Public review):

      Summary:

      Govorunova et al present three new anion opsins that have potential applications in silencing neurons. They identify new opsins by scanning numerous databases for sequence homology to known opsins, focusing on anion opsins. The three opsins identified are uncommonly fast, potent, and are able to silence neuronal activity. The authors characterize numerous parameters of the opsins.

      Strengths:

      This paper follows the tradition of the Spudich lab, presenting and rigorously characterizing potentially valuable opsins. Furthermore, they explore several mutations of the identified opsin that may make these opsins even more useful for the broader community. The opsins AnsACR and FtACR are particularly notable, having extraordinarily fast onset kinetics that could have utility in many domains. Furthermore, the authors show that AnsACR is usable in multiphoton experiments having a peak photocurrent in a commonly used wavelength. Overall, the author's detailed measurements and characterization make for an important resource, both presenting new opsins that may be important for future experiments, and providing characterizations to expand our understanding of opsin biophysics in general.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      First, while the authors frequently reference GtACR1, a well-used anion opsin, there is no side-by-side data comparing these new opsins to the existing state-of-the-art. Such comparisons are very useful to adopt new opsins.

      GtACR1 exhibits the peak sensitivity at 515 nm and therefore is poorly suited for combination with red-shifted CCRs or fluorescent sensors, unlike blue-light-absorbing ancyromonad ACRs. Nevertheless, we conducted side-by-side comparison of ancyromonad ChRs, GtACR1 and GtACR2, the latter of which has the spectral maximum at 470 nm. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 added in the revision. We also added the following text, describing these results, to the revised Results section:

      “Figures 1E and F show the dependence of the peak photocurrent amplitude and reciprocal peak time, respectively, on the photon flux density for ancyromonad ChRs and GtACRs. The current amplitude saturated earlier than the time-to-peak for all tested ChRs. Figure 1 – figure supplement 4A-E shows normalized photocurrent traces recorded at different photon densities. Quantitation of desensitization at the end of 1-s illumination revealed a complex light dependence (Figure 1, Figure Supplement 4F). Figure 1 – figure supplement 5 shows normalized photocurrent traces recorded in response to a 5-s light pulse of the maximal available intensity and the magnitude of desensitization at its end.”

      Next, multiphoton optogenetics is a promising emerging field in neuroscience, and I appreciate that the authors began to evaluate this approach with these opsins. However, a few additional comparisons are needed to establish the user viability of this approach, principally the photocurrent evoked using the 2p process, for given power densities. Comparison across the presented opsins and GtACR1 would allow readers to asses if these opsins are meaningfully activated by 2P.

      We carried out additional 2P experiments in ancyromonad ChRs, GtACR1 and GtACR2 and added their results to a new main-text Figure 6 and Figure 6 – figure supplement 1. We added the new section describing these results, “Two-photon excitation”, to the main text in the revision:

      “To determine the 2P activation range of AnsACR, FtACR, and NlCCR, we conducted raster scanning using a conventional 2P laser, varying the excitation wavelength between 800 and 1,080 nm (Figure 6 – figure supplement 1). All three ChRs generated detectable photocurrents with action spectra showing maximal responses at ~925 nm for AnsACR, 945 nm for FtACR, and 890 nm for NlCCR (Figure 6A). These wavelengths fall within the excitation range of common Ti:Sapphire lasers, which are widely used in neuroscience laboratories and can be tuned between ~700 nm and 1,020-1,300 nm. To assess desensitization, cells expressing AnsACR, FtACR, or NlCCR were illuminated at the respective peak wavelength of each ChR at 15 mW for 5 seconds. GtACR1 and GtACR2, previously used in 2P experiments (Forli, Vecchia et al. 2018, Mardinly, Oldenburg et al. 2018), were included for comparison. The normalized photocurrent traces recorded under these conditions are shown in Figure 6B-F. The absolute amplitudes of 2P photocurrents at the peak time and at the end of illumination are shown in Figure 6G and H, respectively. All five tested variants exhibited comparable levels of desensitization at the end of illumination (Figure 6I).”

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to develop Channelrhodopsins (ChRs), light-gated ion channels, with high potency and blue action spectra for use in multicolor (multiplex) optogenetics applications. To achieve this, they performed a bioinformatics analysis to identify ChR homologues in several protist species, focusing on ChRs from ancyromonads, which exhibited the highest photocurrents and the most blue-shifted action spectra among the tested candidates. Within the ancyromonad clade, the authors identified two new anion-conducting ChRs and one cation-conducting ChR. These were characterized in detail using a combination of manual and automated patch-clamp electrophysiology, absorption spectroscopy, and flash photolysis. The authors also explored sequence features that may explain the blue-shifted action spectra and differences in ion selectivity among closely related ChRs.

      Strengths:

      A key strength of this study is the high-quality experimental data, which were obtained using well-established techniques such as manual patch-clamp and absorption spectroscopy, complemented by modern automated patch-clamp approaches. These data convincingly support most of the claims. The newly characterized ChRs expand the optogenetics toolkit and will be of significant interest to researchers working with microbial rhodopsins, those developing new optogenetic tools, as well as neuro- and cardioscientists employing optogenetic methods.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      This study does not exhibit major methodological weaknesses. The primary limitation of the study is that it includes only a limited number of comparisons to known ChRs, which makes it difficult to assess whether these newly discovered tools offer significant advantages over currently available options.

      We conducted side-by-side comparison of ancyromonad ChRs and GtACRs, wildly used for optical inhibition of neuronal activity. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5 added in the revision. We also added the following text, describing these results, to the revised Results section:

      “Figures 1E and F show the dependence of the peak photocurrent amplitude and reciprocal peak time, respectively, on the photon flux density for ancyromonad ChRs and GtACRs. The current amplitude saturated earlier than the time-to-peak for all tested ChRs. Figure 1 – figure supplement 4A-E shows normalized photocurrent traces recorded at different photon densities. Quantitation of desensitization at the end of 1-s illumination revealed a complex light dependence (Figure 1, Figure Supplement 4F). Figure 1 – figure supplement 5 shows normalized photocurrent traces recorded in response to a 5-s light pulse of the maximal available intensity and the magnitude of desensitization at its end.”

      Additionally, although the study aims to present ChRs suitable for multiplex optogenetics, the new ChRs were not tested in combination with other tools. A key requirement for multiplexed applications is not just spectral separation of the blue-shifted ChR from the red-shifted tool of interest but also sufficient sensitivity and potency under low blue-light conditions to avoid cross-activation of the respective red-shifted tool. Future work directly comparing these new ChRs with existing tools in optogenetic applications and further evaluating their multiplexing potential would help clarify their impact.

      As a proof of principle, we co-expressed AnsACR with the red-shifted cation-conducting CCR Chrimson and demonstrated that the net photocurrent generated by this combination is hyperpolarizing in the blue region of the spectrum, and depolarizing at the red region. In the revision, we added a new panel (Figure 1D) showing these results and the following paragraph to the main text:

      “To test the possibility of using AnsACR in multiplex optogenetics, we co-expressed it with the red-shifted CCR Chrimson (Klapoetke et al., 2014) fused to an EYFP tag in HEK293 cells. We measured the action spectrum of the net photocurrents with 4 mM Cl<sup>-</sup> in the pipette, matching the conditions in the neuronal cytoplasm (Doyon, Vinay et al. 2016). Figure 1D, black shows that the direction of photocurrents was hyperpolarizing upon illumination with λ<500 nm and depolarizing at longer wavelengths. A shoulder near 520 nm revealed a FRET contribution from EYFP (Govorunova, Sineshchekov et al. 2020), which was also observed upon expression of the Chrimson construct alone (Figure 1D, red)”.

      Reviewing Editor Comments:

      The reviewers suggest that direct comparison to GtACR1 is the most important step to make this work more useful to the community.

      We followed the Reviewers’ recommendations and carried out side-by-side comparison of ancyromonad ChRs and GtACR1 as well as GtACR2 (Figure 1E and F, Figure 1 – figure supplement 4, Figure 1 – figure supplement 5, and Figure 6). Note, however, that GtACR1’s spectral maximum is at 515 nm, which makes it poorly suitable for blue light excitation. Also, ChRs are known to perform very differently in different cell types and upon expression of their genes in different vector backbones, so our results cannot be generalized for all experimental systems. Each ChR user needs to select the most appropriate tool for his/her purpose by testing several candidates in his/her own experimental setting.

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legend for Figure 2D-I appears to be incomplete. Please provide a detailed explanation of the panels.

      In the revision, we have expanded the legend of Figure 2 to explain all individual panels.

      (2) The meaning of the Vr shift (Y-axis in Figure 2H-I) should be clarified in the main text to aid reader understanding.

      In the revision, we added the phrase “which indicated higher relative permeability to NO<sub>3</sub> than to Cl<sup>-“</sup> to explain the meaning of the Vr shift upon replacement of Cl<sup>-</sup> with NO<sub>3</sub>-.

      (3) Adding statistical analysis for the peak and end photocurrent values in Figure 2D-F would strengthen the claim that there is minimal change in relative permeability during illumination.

      In the revision, we added the V<sub>r</sub> values for the peak photocurrent to Figure 2H-I, which already contained the V<sub>r</sub> values for the end photocurrent, and carried out a statistical analysis of their comparison. The following sentence was added to the text in the revision:

      “The V<sub>r</sub> values of the peak current and that at the end of illumination were not significantly different by the two-tailed Wilcoxon signed-rank test (Fig. 2G), indicating no change in the relative permeability during illumination.”

      (4) Figure 4H and I seem out of place in Figure 4, as the title suggests a focus on wild-proteins and AnsACR mutants. The authors could consider moving these panels to Figure 3 for better alignment with the content.

      As noted below, we changed the panel order in Figure 4 upon the Reviewer’s request. In particular, former Figure 4I is Figure 4C in the revision, and former Figure 4H is now panel C in Figure 3 – figure supplement 1 in the revision. We rearranged the corresponding section of the text (highlighted yellow in the manuscript).

      (5) The characterization section could be strengthened by including data on the pH sensitivity of FtACR, which is currently missing from the main figures.

      Upon the Reviewer’s request, we carried out pH titration of FtACR absorbance and added the results as Figure 4B in the revision.

      (6) The logic in Figure 4A-G appears somewhat disjointed. For example, Figure 4A shows pH sensitivity for WT AnsACR and the G86E mutant, while Figure 4 B-D shifts to WT AnsACR and the D226N mutant, and Figure 4E returns to the G86E mutant. Reorganizing or clarifying the flow would improve readability.

      We followed the Reviewer’s advice and changed the panel order in Figure 4. In the revised version, the upper row (panels A-C) shows the pH titration data of the three WTs, the middle row (panels D-F) shows analysis of the AnsACR_D226N mutant, and the lower row (panels G-I) shows analysis of the AnsACR_G88E mutant. We also rearranged accordingly the description of these panels in the text.

      (7) In Figure 5A, "NIACR" should likely be corrected to "NlCCR".

      We corrected the typo in the revision.

      (8) The statistical significance in Figure 6C and D is somewhat confusing. Clarifying which groups are being compared and using consistent symbols would improve interoperability.

      In the revision, we improved the figure panels and legend to clarify that the comparisons are between the dark and light stimulation groups within the same current injection.

      (9) The authors pointed out that at rest or when a small negative current was injected, the neurons expressing Cl- permeable ChRs could generate a single action potential at the beginning of photostimulation, as has been reported before. The authors could help by further discussing if and how this phenomenon would affect the applicability of such tools.

      We mentioned in the revised Discussion section that activation of ACRs in the axons could depolarize the axons and trigger synaptic transmission at the onset of light stimulation, and this undesired excitatory effect need to be taken into consideration when using ACRs.

      Reviewer #2 (Recommendations for the authors):

      Govorunova et al present three new anion opsins that have potential applications in silencing neurons. This paper follows the tradition of the Spudich lab, presenting and rigorously characterizing potentially valuable opsins. Furthermore, they explore several mutations of the identified opsin that may make these opsins even more useful for the broader community. In general, I feel positively about this manuscript. It presents new potentially useful opsins and provides characterization that would enable its use. I have a few recommendations below, mostly centered around side-by-side comparisons to existing opsins.

      (1) My primary concern is that while there is a reference to GtACR1, a highly used opsin first described by this team, they do not present any of this data side by side.

      When evaluating opsins to use, it is important to compare them to the existing state of the art. As a potential user, I need to know where these opsins differ. Citing other papers does not solve this as, even within the same lab, subtle methodological differences or data plotting decisions can obscure important differences.

      As we explained in the response to the public comments, we carried out side-by-side comparison of ancyromonad ChRs and GtACRs as requested by the Reviewer. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5, added in the revision. However, we would like to emphasize a limited usefulness of such comparative analysis, as ChRs are known to perform very differently in different cell types and upon expression of their genes in different vector backbones, so our results cannot be generalized for all experimental systems. Each ChR user needs to select the most appropriate tool for his/her purpose by testing several candidates in his/her own experimental setting.

      (2) Multiphoton optogenetics is an emerging field of optogenetics, and it is admirable that the authors address it here. The authors should present more 2p characterization, so that it can be established if these new opsins are viable for use with 2P methods, the way GtACR1 is. The following would be very useful for 2P characterization:

      Photocurrents for a given power density, compared to GtACR1 and GtACR2.

      The new Figure 6 (B-F) added in the revision shows photocurrent traces recorded from the three ancyromonad ChRs and  two GtACRs upon 2P excitation of a given power density.

      Comparing NICCR and FtACR's wavelength specificity and photocurrent. If these opsins are too weak to create reasonable 2P spectra, this difference should be discussed.

      The new Figure 6A shows the 2P action spectra of all three ancyromonad ChRs.

      A Trace and calculated photocurrent kinetics to compare 1P and 2P. This need not be the flash-based absorption characterization of Figure 3, but a side-by-side photocurrent as in Figure 2.

      As mentioned above, photocurrent traces recorded from ancyromonad ChRs and GtACRs upon 2P excitation are shown in the new Figure 6 (B-F). However, direct comparison of the 2P data with the 1P data is not possible, as we used laser scanning illumination for the former and wild-field illumination for the latter.

      Characterization of desensitization. As the authors mention, many opsins undergo desensitization, presenting the ratio of peak photocurrent vs that at multiple time points (probably up to a few seconds) would provide evidence for how effectively these constructs could be used in different scenarios.

      We conducted a detailed analysis of desensitization under both 1P and 2P excitation. The new Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5 show the data obtained under 1P excitation, and the new Figure 6 shows the data for 2P conditions.

      I have to admit, that by the end of the paper, I was getting confused as to which of the three original constructs had which property, and how that was changing with each mutation. I would suggest that a table summarizing each opsin and mutation with its onset and offset kinetics, peak wavelength, photocurrent, and ion selectivity would greatly increase the ability to select and use opsins in the future.

      In the revision, we added a table of the spectroscopic properties of all tested mutants as Supplementary File 2. This study did not aim to analyze other parameters listed by the Reviewer. We added the following sentence referring to this table to the main text:

      “Supplementary File 2 contains the λ values of the half-maximal amplitude of the long-wavelength slope of the spectrum, which can be estimated more accurately from the action spectra than the λ of the maximum.”

      It may be out of the scope of this manuscript, but if a soma localization sequence can be shown to remove the 'axonal spiking' (as described in line 441), this would be a significant addition to the paper.

      Our previous study (Messier et al., 2018, doi: 10.7554/eLife.38506) showed that a soma localization sequence can reduce, but not eliminate, the axonal spiking. We plan to test these new ACRs with the trafficking motifs in the future.

      NICCR appears to have the best photocurrents of all tested opsins in this paper. It seems odd that it was omitted from the mouse cortical neurons experiments.

      We have not included analysis of NlCCR behavior in neurons because we are preparing a separate manuscript on this ChR.

      Figure 6 would benefit from more gradation in the light powers used to silence and would benefit from comparison to GtACR. I suggest using a fixed current with a series of illumination intensities to see which of the three opsins (or GtACR) is most effective at silencing. At present, it looks binary, and a user cannot evaluate if any of these opsins would be better than what is already available.

      In the revision, we added the data comparing the light sensitivity of AnsACR and FtACR with previously identified GtACR1 and GtACR2 (new Figure 1E and F) to help users compare these ACRs. Although they are less sensitive to light comparing to GtACR1 and GtACR2, they could still be activated by commercially available light sources if the expression levels are similar. Less sensitive ACRs may have less unwanted activation when using with other optogenetic tools.

      Reviewer #3 (Recommendations for the authors):

      Suggested Improvements to Experiments, Data, or Analyses:

      (1) Line 25: "significantly exceeding those by previously known tools" and Line 408: "NlCCR is the most blue-shifted among ancyromonad ChRs and generates larger photocurrents than the earlier known CCRs with a similar absorption maximum." As noted in the public review, this statement applies only to a very specific subgroup of ChRs with spectral maxima below 450 nm. If the goal was to claim that NlCCR is a superior tool among a broader range of blue-light-activated ChRs, direct comparisons with state-of-the-art ChRs such as ChR2 T159C (Berndt et al., 2011), CatCh (Kleinlogel et al., 2014), CoChR (Klapoetke et al., 2014), CoChR-3M (Ganjawala et al., 2019), or XXM 2.0 (Ding et al., 2022) would be beneficial. If the goal was to demonstrate superiority among tools with spectra below 450 nm, I suggest explicitly stating this in the paper.

      The Reviewer correctly inferred that we emphasized the superiority of NlCCR among tools with similar spectral maxima, not all blue-light-activated ChRs available for neuronal photoexcitation, most of which exhibit absorption maxima at longer wavelengths. To clarify this, we added “with similar spectral maxima” to the sentence in the original Line 25. The sentence in Line 408 already contains this clarification: “with a similar absorption maximum”.

      (2) Lines 111-113: "The absorption spectra of the purified proteins were slightly blue-shifted from the respective photocurrent action spectra (Figure 1D), likely due to the presence of non-electrogenic cis-retinal-bound forms." I would be skeptical of this statement. The spectral shifts in NlCCR and AnsACR are small and may fall within the range of experimental error. The shift in FtACR is more apparent; however, if two forms coexist in purified protein, this should be reflected as two Gaussian peaks in the absorption spectrum (or at least as a broader total peak reflecting two states with close maxima and similar populations). On the contrary, the action spectrum appears to have two peaks, one potentially below 465 nm. Generally, neither spectrum appears significantly broader than a typical microbial rhodopsin spectrum. This question could be clarified by quantifying the widths of the absorption and action spectra or by overlaying them on the same axis. In my opinion, the two spectra seem very similar, and just appearance of the "bump" in the action spectum shifts the apparent maximum of the action spectrum to the red. If there were two states, then they should both be electrogenic, and the slight difference in spectra might be explained by something else (e.g. by a slight difference in the quantum yields of the two states).

      As the Reviewer suggested, in the revision we added a new figure (Figure 1 – figure supplement 2), showing the overlay of the absorption and action spectra of each ancyromonad ChR. This figure shows that the absorption spectra are wider than the action spectra (especially in AnsACR and FtACR), which confirms our interpretation (contribution of the non-electrogenic blue-shifted cis-retinal-bound forms to the absorption spectrum). Note that the presence of such forms explaining a blue shift of the absorption spectrum has been experimentally verified in HcKCR1 (doi: 10.1016/j.cell.2023.08.009; 10.1038/s41467-025-56491-9). Therefore, we revised the text as follows:

      “The absorption spectra of the purified proteins (Figure 1C) were slightly blue-shifted from the respective photocurrent action spectra (Figure 1 – figure supplement 3), likely due to the presence of non-electrogenic cis-retinal-bound forms. The presence of such forms, explaining the discrepancy between the absorption and the action spectra, was verified by HPLC in KCRs (Tajima et al. 2023, Morizumi et al., 2025).”

      (3) Lines 135-136: "The SyncroPatch enables unbiased estimation of the photocurrent amplitude because the cells are drawn into the wells without considering their tag fluorescence." While SyncroPatch does allow unbiased selection of patched cells, it does not account for the fraction of transfected cells. Without a method to exclude non-transfected cells, which are always present in transient transfections, the comparison of photocurrents may be affected by the proportion of untransfected cells, which could vary between constructs. To clarify whether the statistically significant difference in the Kolmogorov-Smirnov test could indicate that the fraction of transfected cells after 48-72h differs between constructs, I suggest analyzing only transfected cells or reporting fractions of transfected cells by each construct.

      The Reviewer correctly states that non-transfected cells are always present in transiently transfected cell populations. However, his/her suggestion to “exclude non-transfected cells” is not feasible in the absence of a criterion for such exclusion. As it is evident from our data, transient transfection results in a continuum of the amplitude values, and it is not possible to distinguish a small photocurrent from no photocurrent, considering the noise level. We would like, however, to emphasize that not excluding any cells provides an estimate of the overall potency of each ChR variant, which depends on both the fraction of transfected cells and their photocurrents. This approach mimics the conditions of in vivo experiments, when non-expressing cells also cannot be excluded.

      (4) Line 176: "AnsACR and FtACR photocurrents exhibited biphasic rise." The fastest characteristic time is very close to the typical resolution of a patch-clamp experiment (RC = 50 μs for a 10 pF cell with a 5 MΩ series resistance). Thus, I am skeptical that the faster time constant of the biphasic opening represents a protein-specific characteristic time. It may not be fully resolved by patch-clamp and could simply result from low-pass filtering of a specific cell. I suggest clarifying this for the reader.

      The Reviewer is right that the patch clamp setup acts as a lowpass filter. Earlier, we directly measured its time resolution (~15 μs) by recording the ultrafast (occurring on the ps time scale) charge movements related to the trans-cis isomerization (doi: 10.1111/php.12558). However, the lowpass filter of the setup can only slow the entire signal, but cannot lead to the appearance of a separate kinetic component (i.e. a monophasic process cannot become biphasic). Therefore, we believe that the biphasic photocurrent rise reflects biphasic channel opening rather than a measurement artifact. Two phases in the channel opening have also been detected in GtACR1 (doi: 10.1073/pnas.1513602112) and CrChR2 (10.1073/pnas.1818707116).

      (5) Line 516: "The forward LED current was 900 mA." It would be more informative to report the light intensity rather than the forward current, as many readers may not be familiar with the specific light output of the used LED modules at this forward current.

      We have added the light intensity value in the revision:

      “The forward LED current was 900 mA (which corresponded to the irradiance of ~2 mW mm<sup>-2</sup>)…”

      (6) Lines 402-403: "The NlCCR ... contains a neutral residue in the counterion position (Asp85 in BR), which is typical of all ACRs. Yet, NlCCR does not conduct anions, instead showing permeability to Na+." This is not atypical for CCRs and has been demonstrated in previous works of the authors (CtCCR in Govorunova et al. 2021, ChvCCR1 in Govorunova et al. 2022). What is unique is the absence of negatively charged residues in TM2, as noted later in the current study. However, the absence of negatively charged residues in TM2 appears to be rare for ACRs as well. Not as a strong point of criticism, but to enhance clarity, I suggest analyzing the frequency of carboxylate residues in TM2 of ACRs to determine whether the unique finding is relevant to ion selectivity or to another property.

      The Reviewer is correct that some CCRs lack a carboxylate residue in the D85 position, so this feature alone cannot be considered as a differentiating criterion. However, the complete absence of glutamates in TM2 is not rare in ACRs and is found, for example, in HfACR1 and CarACR2. We have discussed this issue in our earlier review (doi: 10.3389/fncel.2021.800313) and do not think that repeating this discussion in this manuscript is appropriate.

      Recommendations for Writing and Presentation:

      (1) Some figures contain incomplete or missing labels:

      Figure 2: Panels D to I lack labels.

      In the revision, we have expanded the legend of Figure 2 to explain all individual panels.

      Figure 3 - Figure Supplement 1: Missing explanations for each panel.

      In the revision, we changed the order of panes and explained all individual panels in the legend.

      Figure 5 - Figure Supplement 1: Missing explanations for each panel.

      No further explanation for individual panels in this Figure is needed because all panels show the action spectra of various mutants, the names of which are provided in the panels themselves. Repeating this information in the figure legend would be redundant.

      (2) In Figure 2, "sem" is written in lowercase, whereas "SEM" is capitalized in other figures. Standardizing the format would improve consistency.

      In the revision, we changed the font of the SEM abbreviation to the uppercase in all instances.

      (3) Line 20: "spectrally separated molecules must be found in nature." There is no proof that they cannot be developed synthetically; rather, it is just difficult. I suggest softening this statement, as the findings of this study, together with others, will probably allow designing molecules with specified spectral properties in the future.

      In the revision, we changed the cited sentence to the following:

      “Multiplex optogenetic applications require spectrally separated molecules, which are difficult to engineer without disrupting channel function”.

      (4) Line 216-219: "Acidification increased the amplitude of the fast current ~10-fold (Figure 4F) and shifted its Vr ~100 mV (Figure 3 - figure supplement 1D), as expected of passive proton transport. The number of charges transferred during the fast peak current was >2,000 times smaller than during the channel opening, from which we concluded that the fast current reflects the movement of the RSB proton." The claim about passive transport of the RSB proton should be clarified, as typically, passive transport is not limited to exactly one proton per photocycle, and the authors observe the increase in the fast photocurrents upon acidification.

      We thank the Reviewer for pointing out the confusing character of our description. To clarify the matter, we added a new photocurrent trace to Figure 4I in the revision recorded from AnsACR_G86E at 0 mV and pH 7.4. We have rewritten the corresponding section of Results as follows:

      “Its rise and decay τ corresponded to the rise and decay τ of the fast positive current recorded from AnsACR_G86E at 0 mV and neutral pH, superimposed on the fast negative current reflecting the chromophore isomerization (Figure 4I, upper black trace). We interpret this positive current as an intramolecular proton transfer to the mutagenetically introduced primary acceptor (Glu86), which was suppressed by negative voltage (Figure 4I, lower black trace). Acidification increased the amplitude of the fast negative current ~10-fold (Figure 4I, black arrow) and shifted its V<sub>r</sub> ~100 mV to more depolarized values (Figure 4 – figure supplement 2A). This can be explained by passive inward movement of the RSB proton along the large electrochemical gradient.”

      Minor Corrections:

      (1) Line 204: Missing bracket in "phases in the WT (Figure 4D."

      The quoted sentence was deleted during the revision.

      (2) Line 288: Typo-"This Ala is conserved" should probably be "This Met is conserved."

      We mean here the Ala four residues downstream from the first Ala. To avoid confusion, we changed the cited sentence to the following:

      “The Ala corresponding to BR’s Gly122 is also found in AnsACR and NlCCR (Figure 5A)…”

      (3) Lines 702-704: Missing Addgene plasmid IDs in "(plasmids #XXX and #YYY, respectively)."

      In the revision, we added the missing plasmid IDs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Minor Issues:

      (1) As the authors mention, MKs have been suggested to mature rapidly at the sinusoids, and both integrin KO and laminin KO MKs appear mislocalized away from the sinusoids. Additionally, average MK distances from the sinusoid may also help separate whether the maturation defects could be in part due to impaired migration towards CXCL12 at the sinusoid. Presumably, MKs could appear mislocalized away from the sinusoid given the data presented suggesting they are leaving the BM and entering circulation. Additional commentary on intrinsic (ex-vivo) MK maturation phenotypes may help strengthen the author's conclusions

      Thank you for your insightful suggestion regarding intrinsic MK maturation defects in integrin KO and laminin KO mice. This indeed could be the case. We have now addressed this possibility in the revised discussion section (page 14; lines 14-15), acknowledging intrinsic maturation defects as a potential contributor to observed maturation issues.

      (2) It would be helpful if the authors could comment as to whether MKs are detectable in blood.

      We appreciate the opportunity to clarify this point. Intact Itgb1<sup>-/-</sup>/Itgb3<sup>-/-</sup> MKs were not detected in the peripheral blood by either flow cytometry or blood smear analysis. This indicates that megakaryocytes do not normally circulate in the systemic bloodstream. Instead, we observed large MK nuclei trapped specifically within the lung capillaries, consistent with their known physical retention in the pulmonary circulation during platelet release. This explanation is now better explained on page 10, lines 14-19.

      (3) Supplementary Figure 6 - shows no effect on in vitro MK maturation and proplt, or MK area - But Figures 6B/6C demonstrate an increase in total MK number in MMP-inhibitor treated mice compared to control. This discrepancy should be better discussed.

      We have now expanded the discussion in the revised manuscript to address the different results obtained in vitro and in vivo, emphazing that the in vitro model may not fully recapitulate the complex and dynamic bone marrow ECM niche. Additionally, differences in the source and regulation of MMPs likely contribute to the differing outcomes, underlining the importance of studying these processes within their physiological context. For instance, non-megakaryocytic sources of MMPs and paracrine regulatory mechanisms may play a critical role within the physiological microenvironment, ultimately affecting MK proliferation and maturation in a manner not observed in simplified culture systems. This clarifications can be found on page 12, lines 6-17.

      (4) A function of the ECM discussed relates to MK maturation but in the B1/3 integrin KO mice, the presence of the ECM cage is reduced but there appears to be no significant impact upon maturation (Supplementary Figure 4). By contrast, MMP inhibition in vivo (but not in vitro) reduces MK maturation. These data could be better clarified in the text.

      Thank you for raising this important point. While Suppl. Figure 4 shows normal size and ploidy in DKO MK, a critical defect is revealed at the ultrastructural level. Mature DKO MKs exhibit severe dysplasia of the demarcation membrane system (DMS), characterized by extensive membrane accumulation and abnormal archirecture, with no typical platelet territories visible. This DMS defect directly impairs MK maturation and explains the thrombocytopenia observed in these mice. Increased emperipolesis further indicated disrupted maturation processes. These observations confirm the essential role of the ECM cage in supporting proper DMS organization and overall MK maturation in vivo, consistent with findings from MMP inhibition experiments. We have clarified and emphasized the significance of these DMS abnormalities in the revised manuscripts, including updated results (Page 9, lines 17-21) and a new EM image in Suppl. Figure 4.

      Reviewer #1 (Public review):

      The authors report on a thorough investigation of the interaction of megakaryocytes (MK) with their associated ECM during maturation. They report convincing evidence to support the existence of a dense cage-like pericellular structure containing laminin γ1 and α4 and collagen IV, which interacts with integrins β1 and β3 on MK and serve to fix the perisinusoidal localization of MK and prevent their premature intravasation. As with everything in nature, the authors support a Goldilocks range of MK-ECM interactions - inability to digest the ECM via inhibition of MMPs leads to insufficient MK maturation and development of smaller MK. This important work sheds light into the role of cell-matrix interactions in MK maturation, and suggests that higher-dimensional analyses are necessary to capture the full scope of cellular biology in the context of their microenvironment. The authors have responded appropriately to the majority of my previous comments.

      We sincerely thank the reviewer for their insightful comments.

      Some remaining points:

      In a previous critique, I had suggested that "it is unclear how activation of integrins allows the MK to become "architects for their ECM microenvironment" as the authors posit. A transcriptomic analysis of control and DKO MKs may help elucidate these effects". The authors pointed out the technical difficulty of obtained sufficient numbers of MK for such analysis, which I accept, and instead analyzed mature platelets, finding no difference between control and DKO platelets. This is not necessarily surprising, since mature circulating platelets have no need to engage an ECM microenvironment, and for the same reason I would suggest that mature platelet analyses are not representative of MK behavior as regards ECM interactions.

      We fully agree with the reviewer that platelet analyses do not accurately reflect the behavior of MKs in the context of interactions with the ECM. This understanding is also one of the reasons why we chose not to include RT-PCR data on platelets in our manuscript. Instead, we emphasize the role of integrins as essential regulators of ECM remodeling, as they transmit traction forces that can significantly influence this process. We also report reduced RhoA activation in DKO MK, which is likely to affect ECM organization. We believe that these explanations contribute to a clearer understanding of how integrin activation enables megakaryocytes to act as "architects" of their ECM microenvironment.

      Reviewer #2 (Public review):

      This study makes a significant contribution to understanding the microenvironment of megakaryocytes (MKs) in the bone marrow, identifying an extracellular matrix (ECM) cage structure that influences MK localization and maturation. The authors provide compelling evidence for the presence of this ECM cage and its role in MK homeostasis, employing an array of sophisticated imaging techniques and molecular analyses.The authors have addressed most of the concerns raised in the previous review, providing clarifications and additional data that strengthen their conclusion.

      More broadly, this work adds to a growing recognition of the ECM as an active participant in haematopoietic cell regulation in the bone marrow microenvironment. This work could pave the way to future studies investigating how the megakaryocytes' ECM cage affects their function as part of the haematopoietic stem cell niche, and by extension, influences global haematopoiesis.

      We thank this reviewer for providing such constructive feedback.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations. Importantly, the rescue experiments also demonstrated that sulfation enzymatic activity is important.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing.

      Significance:

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore, it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      Comments on revised version:

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers

      Reviewer #2 (Public review):

      Summary

      This study provides new insights into organ morphogenesis using the Drosophila salivary gland (SG) as a model. The authors identify a requirement for sulfation in regulating lumen expansion, which correlates with several effects at the cellular level, including regulation of intracellular trafficking and the organization of Golgi, the aECM and the apical membrane. In addition, the authors show that the ZP proteins Dumpy (Dpy) and Pio form an aECM regulating lumen expansion. Previous reports already pointed to a role for Papss in sulfation in SG and the presence of Dpy and Pio in the SG. Now this work extends these previous analyses and provides more detailed descriptions that may be relevant to the fields of morphogenesis and cell biology (with particular focus on ECM research and tubulogenesis). This study nicely presents valuable information regarding the requirements of sulfation and the aECM in SG development.

      Strengths

      -The results supporting a role for sulfation in SG development are strong. In addition, the results supporting the involvement of Dpy and Pio in the aECM of the SG, their role in lumen expansion, and their interactions, are also strong.

      -The authors have made an excellent job in revising and clarifying the many different issues raised by the reviewers, particularly with the addition of new experiments and quantifications. I consider that the manuscript has improved considerably.

      -The authors generated a catalytically inactive Papss enzyme, which is not able to rescue the defects in Papss mutants, in contrast to wild type Papss. This result clearly indicates that the sulfation activity of Papss is required for SG development.

      Weaknesses

      -The main concern is the lack of clear connection between sulfation and the phenotypes observed at the cellular level, and, importantly, the lack of connection between sulfation and the Pio-Dpy matrix. Indeed, the mechanism/s by which sulfation affects lumen expansion are not elucidated and no targets of this modification are identified or investigated. A direct (or instructive) role for sulfation in aECM organization is not clearly supported by the results, and the connection between sulfation and Pio/Dpy roles seems correlative rather than causative. As it is presented, the mechanisms by which sulfation regulates SG lumen expansion remains elusive in this study.

      -In my opinion the authors overestimate their findings with several conclusions, as exemplified in the abstract:

      "In the absence of Papss, Pio is gradually lost in the aECM, while the Dpy-positive aECM structure is condensed and dissociates from the apical membrane, leading to a thin lumen. Mutations in dpy or pio, or in Notopleural, which encodes a matriptase that cleaves Pio to form the luminal Pio pool, result in a SG lumen with alternating bulges and constrictions, with the loss of pio leading to the loss of Dpy in the lumen. Our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining luminal diameter."

      The findings leading to conclude that sulfation organizes the aECM and that the absence of Papss leads to a thin lumen due to defects in Dpy/Pio are not strong. The authors certainly show that Papss is required for proper Pio and Dpy accumulation. They also show that Pio is required for Dpy accumulation, and that Pio and Dpy form an aECM required for lumen expansion. However, the absence of Pio and Dpy do not fully recapitulate Papss mutant defects (thin lumen). I wonder whether other hypothesis and models could account for the observed results. For instance, a role for Papss affecting secretion, in which case sulfation would have an indirect role in aECM organization. This study does not address the mechanical properties of Dpy in normal and mutant salivary glands.

      -Minor issues relate to the genotype/phenotype analysis. It is surprising that the authors detect only mild effects on sulfation in Papss mutants using an anti-sulfoTyr antibody, as Papss is the only Papss synthathase. Generating germ line clones (which is a feasible experiment) would have helped to prove that this minor effect is due to the contribution of maternal product. The loss of function allele used in this study seems problematic, as it produces effects in heterozygous conditions difficult to interpret. Cleaning the chromosome or using an alternative loss of function condition (another allele, RNAi, etc...) would have helped to present a more reliable explanation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers. The addition of the sulfation(-) mutant to Fig. 1 is particularly nice. I have just a few additional suggestions for text changes to improve clarity/precision.

      (1) The current title of this manuscript is quite broad, making it sound like a review article. I recommend adding sulfation and salivary gland to the title to convey the main points more clearly. e.g. Sulfation affects apical extracellular matrix organization during development of the Drosophila salivary gland tube.

      Thank you for the suggestion. We agree and have changed the title of the paper as suggested.

      (2) Figure 1B shows very striking enrichment of papss expression in the salivary gland compared to other tubes like the trachea that also contain Pio and Dpy. To me, this implies that the key substrate(s) of Papss are likely to be unique, or at least more highly enriched, in the salivary gland aECM compared to the tracheal aECM (e.g. probably not Pio or Dpy themselves). I suggest that the authors address the implications of this apparent SG specificity in the discussion (paragraph beginning on p. 21, line 559).

      Yes, we agree that there may be other key substrates of Papss in the SG, such as mucins, which play an important role in organizing the aECM and expanding the lumen. We have included a discussion.

      (3) p. 15, lines 374-376 "The Pio protein is known to be cleaved, at one cleavage site after the ZP domain by the furin protease and at another cleavage site within the ZP domain by the matriptase Notopleural (Np) (Drees et al., 2019; Drees et al., 2023; Figure 5B)." As far as I can see, the Drees papers show that Pio is cleaved somewhere in the vicinity of a consensus furin cleavage site, but do not actually establish that the cleavage happens at this exact site or is done by a furin protease (this is just an assumption). Please word more carefully, e.g. "at one cleavage site after the ZP domain, possibly by a furin protease".

      Thank you for pointing this out. We have edited the text.

      Reviewer #2 (Recommendations for the authors):

      Throughout the paper, I find a bit confusing the description of the lumen phenotype and their interpretations.

      Papss mutants produce SG that are either "thin" or show "irregular lumen with bulges". Do the authors think that these are two different manifestations of the same effect? or do they think that there are different causes behind?

      The thin lumen phenotype appears to occur when the Pio-Dpy matrix is significantly condensed. When this matrix is less condensed in one region of the lumen than in other regions, the lumen appears irregular with bulges.

      Are the defects in Grasp65 mutants categorized as "irregular lumen with bulges" similar to those in Papss mutants? Why do these mutants don't show a "thin lumen" defect?

      Grasp65 mutant phenotypes are milder than those of Papss mutants. Multiple mutations in several Golgi components that more significantly disrupt Golgi structures and function may cause more severe defects in lumen expansion and shape.

      How the defects described for Pio ("multiple constrictions with a slight expansion between constrictions") and Dpy mutants ("lumen with multiple bulges and constrictions") relate to the "irregular lumen with bulges" in Papss mutants?

      pio and dpy mutants show more stereotypical phenotypes, while Papss mutants exhibit more irregular and random phenotypes. The irregular lumen phenotypes in Papss mutants are associated with a condensed Pio-Dpy matrix.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors of this study propose a model in which NPY family regulators antagonize the activity of the pid mutation in the context of floral development and other auxin-related phenotypes. This is hypothesized to occur through regulation of or by PID and its action on the PIN1 auxin transporter.

      Strengths:

      The findings are intriguing.

      We are pleased that the reviewer found the work interesting!

      Weaknesses and Major Comments:

      (1) While the findings are indeed intriguing, the mechanism of action and interaction among these components remains poorly understood. The study would benefit from significantly more thorough and focused experimental analyses to truly advance our understanding of pid phenotypes and the interplay among PID, NPYs, and PIN1.

      Elucidating the mechanism of action and interaction among these components will require years of additional research. As key steps toward these goals, our work clearly established that 1) NPY1 functions downstream of PID, as overexpression of NPY1 completely suppressed pid phenotypes. This is surprising because the predominant model is that PID functions by directly phosphorylating and activating PINs without the need of NPY1 involvement.  2) In the absence of PID, NPY1 protein accumulated less in the NPY1 OE lines, suggesting that PID plays a role in affecting NPY1 stability/degradation/accumulation. We are not sure what are the exact experiments this reviewer is proposing.

      Regarding pid phenotypes, pid is completely sterile in our conditions, while the suppression by NPY1 OE is very clear and the lines are fertile.

      (2) The manuscript appears hastily assembled, with key methodological and conceptual details either missing or inconsistent. Although issues with figure formatting and clarity (e.g., lack of scale bars and inconsistent panel layout) may alone warrant revision, the content remains the central concern and must take precedence over presentation.

      We did not include scale bars in our figures because the phenotype of interest is presence/absence of flowers. Readers should compare the mutants with the rescued plants and the WT plants.

      (3) Given that fertile progeny are obtained from pid-TD pin1/PIN1 and pid NPY OE lines, it would be important to analyze whether mutations and associated phenotypes are heritable. This is especially relevant since CRISPR lines can be mosaic. Comprehensive genotyping and inheritance studies are required.

      We only use stable, heritable, Cas9-free mutants in our studies.  We genotype our mutants in every generation.  More details have been added to the Materials and Methods section. We provide the genetic materials we use to the scientific community when requested to enable verification and extension of our results. 

      (4) The Materials and Methods section lacks essential information on how the lines were generated, genotyped, propagated, and scored. There is also generally no mention of how reproducible the observations were. These genetic experiments need to be described in detail, including the number of lines analyzed and consistency across replicates.

      More details have been added to the Materials and Methods section

      The criticism is not fully accurate. For example, we stated in the main text: “We genotyped T2 progenies from two pid-c1 heterozygous T1 plants (#68 and # 83) for the presence of pid-c1 and for pid-c1 zygosity. We used mCherry signal, which was included in the NPY1 OE construct, as a proxy to determine the presence and absence of the NPY1 transgene. For each line, we identified T2 plants without the NPY1 transgene and without the pid-c1 mutation (called WT-68 and WT-83, respectively). We also isolated T2 plants that contained the NPY1 overexpression construct, but did not have the pid-c1 mutation (called NPY1 OE #68 in WT, and NPY1 OE #83 in WT). Finally, we identified T2 plants that were pid-c1 homozygous and that had the NPY1 transgene (called NPY1 OE #68 in pid-c1 and NPY1 OE #83 in pid-c1). These genetic materials enabled us to compare the same NPY1 OE transgenic event in different genetic backgrounds.”

      The genetic materials used are freely available to the scientific community.  We would like to point out that we used several pin1 and pid alleles to make sure that the phenotypes are caused by the genes of interest.

      (5) The nature of the pid alleles used in the study is not described. This is essential for interpretation.

      The mutants were described in a previous paper (M. Mudgett, Z. Shen, X. Dai, S.P. Briggs, & Y. Zhao, Suppression of pinoid mutant phenotypes by mutations in PIN-FORMED 1 and PIN1-GFP fusion, Proc. Natl. Acad. Sci. U.S.A. 120 (48) e2312918120, https://doi.org/10.1073/pnas.2312918120 (2023).  We have added the relevant information to Materials and Methods.

      (6) The authors measure PIN1 phosphorylation in response to NPY overexpression and conclude that the newly identified phosphorylation sites are inhibitory because they do not overlap with known activating sites. This conclusion is speculative without functional validation. Functional assays are available and must be included to substantiate this claim.

      We concluded that the phosphorylation of PINs in NPY1 OE is inhibitory on the basis of the following: 1) pid is suppressed in pin1 heterozygous backgrounds and by PIN1-GFP<sub>HDR,</sub> demonstrating that partial loss of function of PIN1 or a decrease in PIN1 gene dosage, which decreases PIN1 protein expression, caused the suppression of pid. 2) pid is completely suppressed by NPY1 OE, which caused an increase of PIN phosphorylation, suggesting that phosphorylation of PINs in NPY1 OE lines is inhibitory.  It is true that we do not have biochemical data to support the conclusion. We would like to point out that the phosphorylation sites in PINs identified in this work do overlap with previously identified sites.

      PIN activity assays are conducted in heterologous systems that do not include NPY proteins. Since NPY is important for PIN activities, we believe that these assays may provide misleading results. Moreover, PIN1 is likely part of a large protein complex.  Without knowing the composition of the complex, functional assays in heterologous systems will not be interpretable.

      (7) Figure 5 implies that NPY1 acts downstream of PID, but there is no biochemical evidence supporting this hierarchy. Additional experiments are needed to demonstrate the epistatic or regulatory relationship.

      We show that overexpression of NPY1 completely suppressed the pid phenotype, and this epistatic relationship indicates that NPY1 functions downstream of PID. Moreover, we report that PID is required for NPY1 accumulation, indicating that PID is upstream of NPY1.

      (8) The authors should align their genetic observations with cell biological data on PIN1, PIN2, and PID localization and distribution.

      We are hesitating in using traditional PIN1-GFP, PIN2-GFP lines, as they are not stable in our hands. Localization of PID is still not clear. We have generated PID-GFP<sub>HDR</sub> lines, but we could not detect any fluorescent signals (unpublished results).  In addition, maize PINOID (BIF2) localizes to the nucleus, cytoplasm and cell periphery (Skirpan, A., Wu, X. and McSteen, P. (2008), Genetic and physical interaction suggest that BARREN STALK1 is a target of BARREN INFLORESCENCE2 in maize inflorescence development. The Plant Journal, 55: 787-797. https://doi.org/10.1111/j.1365-313X.2008.03546.x)

      We would rather wait for the proper genetic materials before devoting our effort to this.

      Reviewer #2 (Public review):

      Summary:

      The study is well-conducted, revealing that NPY1, with previously less-characterized molecular functions, can suppress pid mutant phenotypes with a phosphorylation-based mechanism. Overexpression of NPY1 (NPY1-OE) results in PIN phosphorylation at unique sites and bypasses the requirement for PID for this event. Conversely, a C-terminal deleted form of NPY1 (NPY1-dC) fails to rescue pid despite promoting a certain phospho-profile in PIN proteins.

      Strengths:

      (1) The careful genetic analyses of pid suppression by NPY1-OE and the inability of NPY1dC to do the same.

      (2) Phospho-proteomics approaches reveal that NPY1-OE induces phosphorylation of PINs at non-canonical sites, independent of PID.

      Thank you for having accurately summarized the main findings

      Weaknesses:

      (1) The native role of NPY1 is not tested by phospho-proteomics in loss-of-function npy1 mutants. Such analysis would be crucial to demonstrate that NPY1 is required for the observed phosphorylation events.

      This is an excellent point and we agree with the reviewer that analyzing loss-of-function npy mutants is important. The challenge is that we need to knockout NPY1, NPY3, and NPY5 to phenocopy pid. We will also need to find a way to suppress the npy triple mutants, which are sterile, so that we can have meaningful comparisons.

      (2) The functional consequences of the newly identified phosphorylation sites in PINs remain speculative. Site-directed mutagenesis (phospho-defective and phospho-mimetic) would help clarify their physiological roles.

      We agree with the reviewer on this point as well. However, this is not trivial, as we have uncovered so many phosphorylation sites.

      (3) The kinase responsible for NPY1-mediated phosphorylation remains unidentified. Since NPY1 is a non-kinase protein, a model involving recruitment of partner kinases (e.g., PIN-phosphorylating kinases other than PID) should be considered or discussed.

      we will add a sentence to mention D6PK and other kinases in the Discussion in the revised version.  We are hoping that the kinases will come out of future forward genetic screens.

      Reviewer #3 (Public review):

      Summary:

      This manuscript from Mudgett et al. explores the relative roles of PID and NPY1 in auxin-dependent floral initiation in Arabidopsis. Micro vectorial auxin flows directed by PIN1 are essential to flower initiation, and loss of PIN1 or two of its regulators, PID and NPY1 (in a yucca-deficient background) phenocopies the pinformed phenotype. This group has previously shown that PID-PIN1 interactions and function are dosage-dependent. The authors pick up this thread by demonstrating that a heterozygote containing a CRISPR deletion of one copy of PIN1 can restore quasi-wild type floral initiation to pid.

      The authors then show that overexpression of NPY1 is sufficient to more or less restore wild-type floral initiation to the pid mutant. The authors claim that this result demonstrates that NPY1 functions downstream of PID, as this ectopic abundance of NPY1 resulted in phosphorylation of PIN1 at sites that differ from sites of action of PID. The authors pursue evidence that PID action via NPY1 is analogous to the mode of action by which phot1/2 act on NPH3 in seedling phototropism. Such a model is supported by the evidence presented herein that the C terminus of NPY1, which has abundant Ser/Thr content, is phosphorylated, and that the deletion of this domain prevents overexpression compensation of the pinformed phenotype.

      While the results presented support evidence in the literature that PID acts on NPY1 to regulate PIN1 function, it is also possible that NPY1 overexpression results in limited expansion of phosphorylation targets observed with other AGC kinases. And if the phot model is any indication, there may be other PID targets that modulate PIN1-dependent floral initiation.

      However, overexpression of the NPY1 C-terminal deletion construct resulted in phosphorylation of both PIN1 and PIN2 and agravitropic root growth similar to what is observed in pin2 mutants. This suggests that direct PID phosphorylation of PINs and action via NPY1 can be distinguished by phosphorylation sites and by growth phenotypes.

      Strengths:

      A very important effort that places NPY1 downstream of PID in floral initiation.

      We thank the reviewer for the comments.

      Weaknesses:

      As PID has been shown to act on sites that regulate PIN protein polarity as well as PIN protein function, it would be useful if the authors consider how their results would fit/not fit with a model where combinatorial function of NPY1 and PID regulate PIN1 in a manner similar to the way that PID appears to function combinatorially with D6PK on PIN3

      We agree with the reviewer that we do not have a complete picture of how NPY, PID, PIN work together to control flower initiation. Some aspects of our results are difficult to reconcile with the model of PIN1 and PID acting in tandem, i.e., by PID directly phosphorylating and activating PIN1. Indeed, our results suggest that PIN1 and PID have opposite effects on organogenesis. For example, heterozygous pin1 (or PIN1-GFP<sub>HDR,</sub> which is presumably less active than wild type PIN1) suppresses the pid phenotype.  Moreover, pid and pin1 have opposite effects on cotyledon number and true leaf number. Mutations in PID lead to more cotyledons and more true leaves than WT whereas pin1 mutants make fewer cotyledons and fewer true leaves than WT (Bennett SRM, Alvarez J, Bossinger G, Smyth DR (1995) Morphogenesis in pinoid mutants of Arabidopsis thaliana. The Plant Journal 8: 505-520).  We have elaborated on this point in the last paragraph of the Discussion.

      The genetic materials we have generated may allow us to uncover additional components in the pathway from forward genetic screens, which may eventually lead to a clear picture.

  2. Aug 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al have tried to dissect the neural and molecular mechanisms that C. elegans use to avoid digestion of harmful bacterial food. Liu et al show that C. elegans use the ON-OFF state of AWC olfactory neurons to regulate the digestion of harmful gram-positive bacteria S. saprophyticus (SS). The authors show that when C. elegans are fed on SS food, AWC neurons switch to OFF fate which prevents digestion of S. saprophyticus and this helps C. elegans avoid these harmful bacteria. Using genetic and transcriptional analysis as well as making use of previously published findings, Liu et al implicate the p38 MAPK pathway (in particular, NSY-1, the C. elegans homolog of MAPKKK ASK1) and insulin signaling in this process.

      Strengths:

      The authors have used multiple approaches to test the hypothesis that they present in this manuscript.

      Weaknesses:

      Overall, I am not convinced that the authors have provided sufficient evidence to support the various components of their hypothesis. While they present data that loosely align with their hypothesis, they fail to consider alternative explanations and do not use rigorous approaches to strengthen their overall hypothesis. The selective picking of genes from the RNA sequencing data and forcing the data to fit the proposed hypothesis based on previously published findings, without exploring other approaches, indicates a lack of thoroughness and rigor. These critical shortcomings significantly diminish enthusiasm for the manuscript in its totality. In my opinion, this is the biggest weakness in this manuscript.

      We appreciate the reviewer’s all the suggestions which help us to improve this paper. We now addressed reviewer’s comments at the section of “Reviewer #1 (Recommendations for the authors)”

      Reviewer #2 (Public review):

      Summary:

      Using C. elegans as a model, the authors present an interesting story demonstrating a new regulatory connection between olfactory neurons and the digestive system.

      Mechanistically, they identified key factors (NSY-1, STR-130 et.al) in neurons, as well as critical 'signaling factors' (INS-23, DAF-2) that bridge different cells/tissues to execute the digestive shutdown induced by poor-quality food (Staphylococcus saprophyticus, SS).

      Strengths:

      The conclusions of this manuscript are mostly well supported by the experimental results shown.

      Weaknesses:

      Several issues could be addressed and clarified to strengthen their conclusions.

      (1) The word "olfactory" should be carefully used and checked in this manuscript. Although AWCs are classic olfactory neurons in C. elegans, no data in this manuscript supports the idea that olfactory signals from SS drive the responses in the digestive system. To validate that it is truly olfaction, the authors may want to check the responses of worms (e.g. AWC, digestive shutdown, INS-23 expression) to odors from SS.

      We appreciate the reviewer’s careful attention to terminology. We agree that the term "olfactory" requires direct experimental validation. However, in this paper, we only used "olfactory" to specific define the AWC neurons. As reviewer’s suggestion, we now deleted the word “olfactory”.

      (2) In line 113, what does "once the digestive system is activated" mean? The authors need to provide a clearer statement about 'digestive activation' and 'digestive shutdown'.

      Previously, we observed that activating larval digestion with heat-killed E. coli or E. coli cell wall peptidoglycan (PGN) enabled the digestion of SS as food (Hao et al., 2024). Additionally, when animals reached the L2 stage by feeding normal OP50 diet, they could utilize SS as a food source to support growth (Figure 1figure supplement 1D). These findings suggest that once digestion is activated (via E. coli components or L2-stage maturation), worms gain the capacity to process SS as a viable food source, abolishing SS-induced growth impairment (Hao et al., 2024) ( Figure 1figure supplement 1D).

      (3) No control data on OP50. This would affect the conclusions generated from Figures 2A, 2B, 2D, 3B, 3C, 3G, 4D-G, 5D-E, 6B-D.

      We appreciate  this point. The central goal of the experiments listed (Figures 2A,B,D; 3B,C,G; 4D-G; 5D-E; 6B-D) was not to compare growth or behavior between SS and OP50 under standard conditions, but rather to understand the genetic basis of the C. elegans response specifically to SS, as identified through our nsy-1 mutant screen.

      Our data in Figure 1 clearly establishes the fundamental difference in growth and feeding behavior when larvae encounter SS compared to OP50 (Figures 1A,B). Having established SS as an unfavorable food source that triggers a specific protective response (digestive shutdown), the subsequent experiments focus on deciphering how this response is mediated.

      Therefore, within these specific experimental contexts under SS feeding: The primary comparison is between wild-type (N2) and nsy-1 mutant animals. All assays (growth, behavior, survival) are performed under the same SS feeding conditionsfor both genotypes.

      This design allows us to directly assess the functional role of NSY-1 in mediating the SS-specific response pathway we are investigating. Including an OP50 control for every figure would not address this core genetic question and could introduce confounding variables given the established difference in how C. elegans treats these two food sources. The critical internal control for these specific experiments is the performance of the wild-type under SS versus the mutant under SS.

      (4) Do the authors know which factors are released from AWC neurons to drive the digestive shutdown?

      Enrichment analysis revealed that genes related to extracellular functions, such as insulin-related genes, are induced in nsy-1 mutant animals (Figure 5—figure supplement 1A, Supplementary file 4). Further analysis of insulin-related genes from the RNA-seq data showed that ins-23 is predominantly induced in nsy-1 mutant animals (Figure 5—figure supplement 1B), suggesting its potential role in promoting SS digestion. We found that knockdown of ins-23 in nsy-1 mutants inhibited SS digestion (Figure 5D). Given that INS-23 is expressed in AWC neurons (Figure 5figure supplement 3A, CeNGEN), this suggests increased production and likely enhanced release of INS-23 from AWC neurons in the nsy-1 mutant background, which promotes SS digestion.

      The insulin/insulin-like growth factor signaling (IIS) pathway, particularly through the DAF-2 receptor, integrates nutritional signals to regulate various behavioral and physiological responses related to food (Kodama et al., 2006; Ryu et al., 2018). It has been shown that INS-23 acts as an antagonist for the DAF-2 receptor to promote larval diapause (Matsunaga et al., 2018). To test whether ins-23 induction in nsy-1 mutants promotes SS digestion through its receptor, DAF-2, we constructed a nsy-1; daf-2 double mutant. We found that the SS digestion ability of the nsy-1 mutant was inhibited by the daf-2 mutation. This suggests that the nsy-1 mutation induces the insulin peptide ins-23, which promotes SS digestion through its potential receptor, DAF-2.

      The data supports a model where AWC neurons regulate digestion via the release of INS-23. Loss of nsy-1 function increases INS-23 release from AWC, activating DAF-2 signaling and promoting digestion. Conversely, in wild-type animals, reduced INS-23 release from AWC contributes to digestive shutdown in response to SS food.

      Reviewer #3 (Public review):

      Summary:

      The study explores a molecular mechanism by which C. elegans detects low-quality food through neuron-digestive crosstalk, offering new insights into food quality control systems. Liu and colleagues demonstrated that NSY-1, expressed in AWC neurons, is a key regulator for sensing Staphylococcus saprophyticus (SS), inducing avoidance behavior and shutting down the digestive system via intestinal BCF-1. They further revealed that INS-23, an insulin peptide, interacts with the DAF-2 receptor in the gut to modulate SS digestion. The study uncovers a food quality control system connecting neural and intestinal responses, enabling C. elegans to adapt to environmental challenges.

      Strengths:

      The study employs a genetic screening approach to identify nsy-1 as a critical regulator in detecting food quality and initiating adaptive responses in C. elegans. The use of RNA-seq analysis is particularly noteworthy, as it reveals distinct regulatory pathways involved in food sensing (Figure 4) and digestion of Staphylococcus saprophyticus (Figure 5). The strategic application of both positive and negative data mining enhances the depth of analysis. Importantly, the discovery that C. elegans halts digestion in response to harmful food and employs avoidance behavior highlights a physiological adaptation mechanism.

      Weaknesses:

      Major points:

      (1) While NSY-1 positively regulates str-130 expression in AWC neurons and is critical for SS avoidance and survival, the authors should examine whether similar phenotypes are observed in str-130 mutants.

      In this study, we mainly focused on how worms sense adverse food sources (SS food) and shutdown digestion (not growth as digestion shutdown readout). We found that nsy-1 in AWC play key roles in response SS food, once nsy-1 mutation, mutant animals cannot detect SS food and digest it, therefore growth under SS food. From RNA-seq, we found that nsy-1 positively regulates several sensory perception related genes (sra-32, str-87, str-112, str-130, str-160, str-230) (Figure 4figure supplement 1A, Supplementary file 2). After screen, we found that we found that knockdown of str-130 in wild-type animals promoted SS digestion, thereby supporting animal growth (Figure 4D), and the proportion of animals with two AWC<sup>OFF</sup> neurons decreased (Figure 4E). Secondly, we found that overexpression of str-130 in nsy-1 mutant animals inhibited SS digestion, thereby slowing animal growth (Figure 4F), and the proportion of animals with two AWC<sup>OFF</sup> neurons increased (Figure 4G). These results demonstrate that NSY-1 promotes the AWC<sup>OFF</sup> state by inducing str-130 expression, which in turn inhibits SS digestion in C. elegans.

      (2) NSY-1 promotes the AWC-OFF state through str-130, inhibiting SS digestion. The authors should investigate whether STR-130 in AWC neurons regulates bcf-1 expression levels in the intestine.

      We agree with the reviewer's suggestion regarding the potential role of STR-130 in AWC neurons regulating intestinal bcf-1 expression. To address this, we generated transgenic worms with AWC-specific knockdown of str-130, achieved by rescuing sid-1 cDNA expression under the ceh-36 promoter (AWC-specific) in sid-1(qt9);BCF-1::GFP background worms.

      We observed that AWC neuron-specific RNAi of str-130 elevated intestinal BCF-1::GFP expression (Figure 6—figure supplement 1B). This demonstrates that STR-130 functions cell-non-autonomously in AWC neurons to repress BCF-1 expression in the intestine.

      (3) The current results rely on str-2 expression levels to indicate the AWC state. Ablating AWC neurons and testing the effects on digestion would provide stronger evidence for their role in digestive regulation.

      To confirm the important of AWC state in SS digestion, we performed AWC-specific neuron ablation experiments using previously validated transgenic strain that expresses cleaved caspase under the AWC-specific promoter, ceh-36 (ceh-36p::caspase). Critically, worms with ablated AWC neurons completely failed to digest SS food (Figure 3—figure supplement 4), phenocopying the non-digesting state of wild-type worms on SS when AWC-OFF signaling is impaired. This result directly confirms that functional AWC neurons are essential for initiating SS digestion, aligning with our model where the AWC-OFF state (induced by SS) inhibits digestion while the AWC-ON state promotes it.

      Furthermore, we previously study discovered that AWC ablation activates the intestinal mitochondrial unfolded protein response and inhibits food digestion, mechanistically linking neuronal integrity to gut stress responses and digestive inhibition.

      Together, these functional ablation studies provide compelling physiological evidence that AWC neurons act as central regulators of food-state sensing and gut function.

      (4) The claim that NSY-1 inhibits INS-23 and that INS-23 interacts with DAF-2 to regulate bcf-1 expression (Line 339-340) requires further validation. Neuron-specific disruption of INS-23 and gut-specific rescue of DAF-2 should be tested.

      We agree with the reviewer that the proposed NSY-1 ⊣ INS-23 → DAF-2 → BCF-1 signaling axis requires tissue-specific validation. To address this, we conducted compartment-specific functional dissection of INS-23 and DAF-2:

      AWC neuronal role of INS-23:

      To test whether INS-23 acts in AWC neurons to regulate intestinal BCF-1, we generated AWC-specific knockdown strains which was achieved by rescuing sid-1 cDNA expression under the ceh-36 promoter in a sid-1(qt9);BCF-1::GFP background. We found that AWC-restricted ins-23 knockdown significantly reduced intestinal BCF-1::GFP expression (Figure 6—figure supplement 1A). This confirms that INS-23 functions cell-non-autonomously within AWC sensory neurons to activate intestinal BCF-1, consistent with NSY-1’s upstream inhibition of INS-23 in this neuronal  subtype

      Intestinal role of DAF-2 as INS-23 receptor:

      To investigate weather DAF-2 acts as the gut-localized receptor for neuronal INS-23 signaling, we performed tissue-specific rescue experiments in the nsy-1(ag3);daf-2(e1370) double mutant. When DAF-2 was re-introduced specifically in the intestine (using the ges-1 promoter), we observed a significant suppression of SS digestion (Figure 5—figure supplement 3B), but not rescue digestive defect. This indicates that INS-23 induction in nsy-1 mutants promotes digestion independently of intestinal DAF-2 function.

      (5) Figure Reference Errors: Lines 296-297 mention Figure 6E, which does not exist in the main text. This appears to refer to Figure 5E, which has not been described.

      We corrected this.

      Reviewer #1 (Recommendations for the authors):

      I would like the authors to address the following comments in a resubmission.

      (1) The hallmark of the activated p38 MAPK pathway is the phosphorylation of most downstream kinase p38 (PMK-1/PMK2 in C. elegans) of this kinase cascade. Previous work from Bergmann lab showed that the most downstream kinase of this pathway, PMK-1/PMK-2, is not required for AWC asymmetry. I wonder whether that is the case also for the model that Liu et al have presented in this manuscript. Since p38/PMK-1 undergoes activation (phosphorylation) in response to pathogenic bacteria like P. aeruginosa, it is worth testing whether PMK-1 plays a role downstream of NSY-1 in the model that Liu et al present in this manuscript. It would be worth testing whether there is increased phosphorylation of p38 when C. elegans are fed SS and whether that phosphorylation regulates downstream components that Liu et al have identified in this manuscript.

      We thank the reviewer for raising this important point regarding PMK-1/p38 MAPK signaling. As established in our prior work (Reference 1), SS exposure triggers phosphorylation of PMK-1 (P-PMK-1) in C. elegans, and pmk-1 mutants exhibit enhanced growth on SS (Figure-1, Figure-2). This confirms that PMK-1-mediated innate immune signaling actively regulates SS responsiveness and digestion.

      To address whether PMK-1 functions downstream of NSY-1 within our proposed model, we performed critical epistasis analyses. While we observed that nsy-1 mutation elevates ins-23 (indicating NSY-1 suppression of ins-23), knockdown of pmk-1 did not alter ins-23 expression levels (Figure 5-figure supplement 3C). This demonstrates that PMK-1 does not operate through the ins-23 pathway to regulate SS digestion. Thus, although both pathways respond to SS, the PMK-1-mediated innate immune response and the NSY-1/INS-23 axis constitute distinct regulatory mechanisms governing digestive adaptation.

      Reference 1: Geng, S., Li, Q., Zhou, X., Zheng, J., Liu, H., Zeng, J., Yang, R., Fu, H., Hao, F., Feng, Q., & Qi, B. (2022). Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. Cell host & microbe, 30(10), 1401–1416.e8. https://doi.org/10.1016/j.chom.2022.08.004

      (2) Since p38 MAPK pathway has a well-established role in host defense in the C. elegans intestine, it is important to show that NSY-1 does not function in the intestine in the model that Liu et al present. I would like the authors to reintroduce nsy-1 in C. elegans intestine in nsy-1 mutant animals and then test whether it has any effect on worm length on SS food (similar to what is done in Figure 3 for AWC-specific nsy-1).

      Beyond its  established  role  in  AWC  neurons,  we  detected  NSY-1 expression in the intestine (Figure 3-figure supplement 2A). To assess intestinal NSY-1 function, we performed tissue-specific rescue experiments in nsy-1 mutants using the intestinal-specific vha-1 promoter. Intestinal expression of NSY-1 significantly suppressed the enhanced SS digestion phenotype in nsy-1 mutants (Figure 3-figure supplement 2B), demonstrating functional involvement of gut-localized NSY-1 in regulating digestive responses. We propose intestinal NSY-1 mediates this effect through innate immune signaling, consistent with its known pathway components. As previously established (Reference 1), the canonical PMK-1/p38 MAPK pathway functions downstream of NSY-1, with both sek-1 and pmk-1 knockdown enhancing SS digestion through immune modulation. This indicates intestinal NSY-1 suppresses digestion may act through PMK-1-mediated immune responses. Since neuronal NSY-1's role in digestive control was previously undefined, we prioritized mechanistic analysis of its neuronal function in digestion regulation.

      Notably, this immune-mediated mechanism operates independently of NSY-1's neuronal regulation pathway. In AWC neurons, NSY-1 controls digestion exclusively through the neuropeptide signaling axis (INS-23/DAF-2/BCF-1) without engaging innate immune components.

      Reference 1: Geng, S., Li, Q., Zhou, X., Zheng, J., Liu, H., Zeng, J., Yang, R., Fu, H., Hao, F., Feng, Q., & Qi, B. (2022). Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. Cell host & microbe, 30(10), 1401–1416.e8. https://doi.org/10.1016/j.chom.2022.08.004

      (3) At multiple places, wild-type (WT) controls have been labeled as N2. It is better to label all controls as WT (and not as N2).

      Corrected.

      (4) In Figure 2B, the aversion response should be scored at multiple time points, like Figure 1C, rather than at just one timepoint.

      We thank the reviewer for suggesting multi-timepoint analysis of aversion behavior. In accordance with this recommendation, we have now quantified SS avoidance at multi-timepoint. As shown in the revised Figure 2B, nsy-1 mutants exhibited significantly impaired avoidance responses at both 4h and 6h but not at 8h, confirming that NSY-1 is essential for sustained aversion to SS food in the early response. This data demonstrates that the critical role of NSY-1 in food discrimination at initial sensory responses.

      (5) Does the re-introduction of nsy-1 in AWC neurons in nsy-1 mutant background help animals avoid SS in dwelling and food-choice assays? Along the same lines, does the CRISPR-generated AWC-specific mutant of NSY-1 fail to avoid SS in dwelling and food-choice assays similar to the whole-animal mutant? These behavioral data are missing in Figure 3.

      We thank the reviewer for prompting behavioral validation of AWC-specific nsy-1 functions. To determine whether NSY-1 in AWC neurons mediates SS sensory perception, we performed dwelling (avoidance) and food-choice assays using AWC-specific nsy-1 knockout and AWC-rescued strains (nsy-1(ag3); Podr-1::nsy-1). In dwelling assays, AWC-specific nsy-1 KO mutants exhibited significantly impaired SS avoidance at 6h (Figure 3-figure supplement 3A), while AWC-rescued strains restored avoidance capacity at 2-6h (Figure 3-figure supplement 3B). Food-choice assays further revealed that AWC nsy-1 KO mutants preferentially migrated toward SS (Figure 3-figure supplement 3C), whereas AWC-rescued showed no preference between SS and HK-E. coli (Figure 3-figure supplement 3D). These data conclusively demonstrate that NSY-1 acts in AWC neurons to mediate SS recognition and aversion behaviors.

      (6) In Figure 3E and F, the number of animals that were used for scoring AWC str-2p::GFP expression should be specified.

      we added the number of animals in the figure.

      (7)  RNA seq analysis identified multiple GPCRs (including STR-130) that are upregulated in an NSY-1-dependent manner when animals are fed with SS bacteria. However, the authors decided to only characterize STR-130 because of previously published findings. It is important to rule out the role of other GPCRs since all are upregulated on SS food as shown in Figure S4 B. I would like the authors to knock down other GPCRs in the same manner as they did for STR-130 and demonstrate that only str-130 knockdown behaves similarly to the nsy-1 mutant (if that is the case) using the assay presented in Figure 4 D.

      We appreciate the reviewer’s suggestion to comprehensively evaluate NSY-1-regulated GPCRs. In response, we extended our functional analysis to all six GPCRs (str-130, str-230, str-87, str-112, str-160, and sra-32) identified as NSY-1-dependent and SS-induced in RNA-seq (Figure 4—figure supplement 1).

      Using RNAi knockdown and the SS growth assay, we observed that RNAi of str-130, str-230, str-87, or str-112 significantly enhanced SS growth (Figure 4—figure supplement 2A), with str-130 RNAi exhibiting the most robust phenotype—phenocopying nsy-1 mutants. Crucially, none of these GPCR knockdowns further enhanced growth in nsy-1(ag3) mutants (Figure 4—figure supplement 2B), confirming their position downstream of NSY-1. These data establish str-130 as the dominant effector of NSY-1-mediated SS response regulation, while suggesting minor contributions from other GPCRs (str-230, str-87, str-112).

      (8) In Figure 4E and G, the number of animals that were used for scoring GFP expression should be specified.

      we added the number of animals in the figure.

      (9) When comparing Figure 3E and Figure 4E, it appears that the loss of str-130 RNAi does not phenocopy nsy-1 mutant. This raises the question of whether the inefficiency of RNAi targeting str-130 is the cause, or if STR-130 is not the only GPCR regulated by NSY-1 on SS food. I would like the authors to address this discrepancy. If RNAi inefficiency is indeed the cause, using an RNAi-sensitive background, such as an eri- 1 mutant, could help strengthen the data presented in Figure 4E. Conversely, if RNAi inefficiency is not responsible for the discrepancy, I suggest that the authors investigate the roles of other GPCRs that were identified by RNA sequencing.

      We appreciate the reviewer’s observation regarding the phenotypic difference between nsy-1 mutants and str-130 (RNAi) animals on SS food (Fig. 3E vs Fig. 4E).

      While both genetic perturbations significantly enhance SS growth and increase the proportion of animals exhibiting AWC<sup>ON</sup> states compared to wild type (indicating enhanced digestion), the specific AWC<sup>ON </sup> neuron configurations differ: nsy-1 mutants predominantly show 2 AWC<sup>ON</sup> animals, whereas str-130(RNAi) animals primarily exhibit the 1 AWC<sup>ON</sup> /1 AWC<sup>OFF</sup> configuration (Fig. 3E vs Fig. 4E).

      This difference likely arises because STR-130 is the key GPCR mediating NSY-1's inhibitory effect on SS digestion, but it is not the sole GPCR involved, as evidenced by our RNAi screen identifying several additional NSY-1-regulated GPCRs (str-230, str-87, str-112) whose depletion also enhanced SS growth (Fig. 4A-D).

      The robust SS growth enhancement and AWC<sup>ON </sup> state increase caused by str-130 (RNAi) (phenocopying the nsy-1 mutant’s functional outcome of enhanced digestion) (Figure 4D, 4E) indicate effective RNAi knockdown for this specific assay. Therefore, the distinct neural configurations reflect the partial redundancy among GPCRs downstream of NSY-1, rather than an inherent inefficiency of the str-130 RNAi.

      The nsy-1 mutant phenotype represents the complete loss of all inhibitory GPCR signaling coordinated by NSY-1, while str-130(RNAi) represents the loss of its major component. Investigating the roles of other identified GPCRs (str-230, str-87, str-112) in modulating AWC<sup>ON </sup> neuron states is an important direction for future research.

      (10) In Figure 4 F and 4 G, the authors show that the overexpression of STR-130 rescues the nsy-1 mutant phenotype suggesting that NSY-1 might function through STR-130 to control digestion on SS food. These data place STR-130 downstream of NSY-1. To further strengthen these epistasis data, authors should knock down str-130 in nsy-1 mutant animals and show that the combined loss of both genes produces the same effect as the loss of either gene alone.

      We thank the reviewer for the insightful suggestion to further define the genetic relationship between nsy-1 and str-130. To strengthen our epistasis analysis, we performed RNAi knockdown of str-130 in the nsy-1(ag3) mutant background and assessed development on SS food. Consistent with STR-130 acting downstream of NSY-1, the loss of str-130 via RNAi did not further enhance the developmental capacity (i.e., growth phenotype) of nsy-1(ag3) mutant animals on SS. This lack of enhancement indicates that str-130 and nsy-1 function within the same genetic pathway, with str-130 acting epistatically downstream of nsy-1 (Figure 4—figure supplement 3). This finding reinforces the model proposed from our overexpression data (Fig. 4F-G) – that NSY-1 primarily exerts its inhibitory effect on SS digestion by inducing the expression GPCR STR-130.

      (11) In Figure 5C, please mention "ins-23 transcript levels" on the top of the graph so that it is clear what these data represent.

      We appreciate the reviewer’s suggestion.

      (12) Since all ins genes were upregulated in nsy-1 mutants (though ins-23 was indeed the most highly upregulated gene) on SS food from RNA seq analysis (Figure S5 B), it is important to first phenotypically characterize all of them using "worm length assay". If this analysis shows that ins-23 has the most robust phenotype, it would make more sense to just focus on ins-23.

      We agree with the reviewer that initial phenotypic characterization of candidate genes identified through transcriptomic analysis is valuable.Our RNA-seq data revealed that several insulin-like peptide genes, including ins-22, ins-23, ins-24, and ins-27, were significantly upregulated in the nsy-1 mutant on SS food (Figure 5—figure supplement 1B). We prioritized these insulin-like peptide genes for functional validation because they are known to act as neuropeptides capable of mediating non-cell autonomous signaling in previous studies (Shao et al 2016).

      To determine if any were functionally responsible for the enhanced SS growth observed in nsy-1 mutants, we performed functional phenotypic screening using the SS growth assay (worm length assay). We individually knocked down each of these candidates (ins-22, ins-23, ins-24, ins-27) in the nsy-1(ag3) mutant background. Among these, only RNAi targeting ins-23 significantly attenuated (i.e., suppressed) the enhanced development of the nsy-1(ag3) mutant on SS (Figure 5—figure supplement 2). This targeted functional screening revealed that ins-23 has the most robust and specific role in mediating the enhanced digestion phenotype downstream of NSY-1 loss, providing the critical justification for our subsequent focus on this particular insulin-like peptide.

      Ref:

      Shao, L. W., Niu, R., & Liu, Y. (2016). Neuropeptide signals cell non-autonomous mitochondrial unfolded protein response. Cell research, 26(11), 1182–1196. https://doi.org/10.1038/cr.2016.118

      Reviewer #2 (Recommendations for the authors):

      There are several minor errors and typos in the manuscript

      (1) A number of typos in the figures, like "length".

      Corrected.

      (2) The 'axis labels' are inconsistent from panel to panel, like "relative body length" and "relative worm length".

      Corrected.

      (3) The fonts are inconsistent from panel to panel.

      Corrected.

      (4) There is no Ex unique number for transgenic lines.

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1)  Figure 3B, 3C, 3G, 4D, 4F, 5D, 5E, and 6C: Replace "lenth" with "length" (consistent with Figure 2A).

      Corrected.

      (2) Figure 4D: Correct "ctontrol" to "control."

      Corrected.

      (3) Figure 4G: Update the co-injection marker to Podr-1::GFP instead of Pstr-2::GFP.

      Corrected.

      (4) Figure 5C: This figure is missing from the Results section.

      Corrected.

      (5) Figure 6A: Label the graph with Pbcf-1::bcf-1::GFP, as in Figure 6D.

      Corrected.

      (6) Italicization: Lines 588 and 603-italicize nsy-1.

      Corrected.

      (7) Supplementary Figure S2A: Correct "Screeng" to "Screening."

      Corrected.

      (8) Spelling/Proofreading: Ensure consistent spelling and grammar, such as correcting "mutan" to "mutant" in Figure 4A.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) I suggest that the author's choose a different term in their title, abstract and manuscript to describe the phenotypes associated with ufd-1 and npl-4 knockdown other than an "inflammation-like response." Inflammation is a pathological term with four cardinal signs: redness (rubor), swelling (tumor), warmth (calor) and pain (dolor). These are not symptoms know to occur in C. elegans. The authors could consider using "tolerance" instead, as this term may better describe their findings.

      We have changed “inflammation-like response” to “aberrant immune response” throughout the manuscript.

      (2) It would help the reader to better understand the novelty of the findings in this study if the authors include a paragraph in their introduction to put their results in context of the published literature that has examined the relationship between immune activation and nematode health and survival. In particular, I suggest that the authors discuss doi:10.7554/eLife.74206 (2022), a study that charcterized a similar observation to what the authors are reporting. This study found that low cholesterol reduces pathogen tolerance and host survival during pathogen infection. Cholesterol scarcity increases p38 PMK-1 phosphorylation, priming immune effector induction in a manner that reduces pathogen accumulation in the intestine during a subsequent infection. I also suggest that the authors highlight in this introductory paragraph that the toxic effects of inappropriate immune activation in C. elegans has been widely catalogued. For example: doi.org/10.1371/journal.ppat.1011120 (2023); doi:10.1186/s12915-016-0320-z (2016).; doi:10.1126/science.1203411 (2011); doi:10.1534/g3.115.025650 (2016).

      In this context, the authors could consider re-wording their novelty claim in the abstract and introduction to take into account this previous body of work.

      We have added a paragraph to the Discussion section to place our findings in the context of previous research. The revised manuscript now includes the following text (page 11, lines 336–344): “Previous studies have shown that hyperactivation of immune pathways can negatively affect organismal development. For example, sustained activation of the p38 MAPK pathway impairs development in C. elegans (Cheesman et al., 2016; Kim et al., 2016), and excessive activation of the IPR also leads to developmental defects (Lažetić et al., 2023). Similar to our current study, recent work has demonstrated that heightened immune responses can reduce gut pathogen load while paradoxically decreasing host survival during infection (Ghosh and Singh, 2024; Peterson et al., 2022). However, our study uniquely shows that while such heightened immune responses are detrimental to immunocompetent animals, they can be beneficial in the context of immunodeficiency.”

      (3) The authors rely on the use of RNAi of ufd-1 and npl-4 to study their effect on P. aeruginosa colonization and pathogen resistance throughout the manuscript. To address the possibility of off-target effects of the RNAi, the authors should consider both (i) showing with qRT-PCR that these genes are indeed targeted during RNAi, and (ii) confirming their phenotypes with an orthologous technique, preferably by studying ufd-1 and npl-4 loss-offunction mutants [both in the wild-type and sek-1(km4) backgrounds]. If mutation of these genes is lethal, the authors could use Auxin Inducible Degron (AID) technology to induce the degradation of these proteins in post-developmental animals.

      We attempted several protocols of CRISPR in our laboratory to generate ufd-1 loss-of-function mutants; however, these efforts were unsuccessful. While this does not rule out the possibility of generating ufd-1 mutants, the failure is likely due to technical limitations on our part rather than an inherent inability to disrupt the gene. Nevertheless, to confirm the specificity of our RNAi-based approach, we quantified ufd-1 and npl-4 mRNA levels following RNAi treatment and found that each gene was specifically and effectively downregulated by its respective RNAi. 

      Importantly, ufd-1 and npl-4 RNA sequences do not share significant homology, yet knockdown of either gene results in nearly identical phenotypes, including reduced survival on P. aeruginosa, diminished intestinal colonization, and shortened lifespan. These consistent outcomes strongly support the conclusion that the phenotypes are attributable to the disruption of the functional UFD-1-NPL-4 complex. We have added these results in the revised manuscript (pages 4-5, lines 114-125): “To confirm the specificity of the RNAi knockdowns and rule out potential off-target effects, we examined transcript levels of ufd-1 and npl-4 following RNAi treatment. RNAi against ufd-1 significantly reduced ufd-1 mRNA levels without reducing npl-4 expression, while npl-4 RNAi specifically downregulated npl-4 transcripts with no impact on ufd-1 mRNA levels (Figure 1—figure supplement 1A and B). Additionally, alignment of ufd-1 and npl-4 mRNA sequences against the C. elegans transcriptome revealed no significant similarity to other genes, supporting the specificity of the RNAi constructs. Moreover, the ufd-1 and npl-4 RNA sequences do not share significant sequence similarity. Therefore, the highly similar phenotypes observed in ufd-1 and npl-4 knockdown animals, including shortened lifespan, reduced survival on P. aeruginosa, and decreased intestinal colonization with P. aeruginosa, strongly suggest that these outcomes result from the disruption of the functional UFD-1-NPL-4 complex.”

      (4) I am confused about the authors explanation regarding their observation that inhibition of the UFD-1/ NPL-4 complex extends the lifespan of sek-1(km25) animals, but not pmk-1(km25) animals, as SEK-1 is the MAPKK that functions immediately upstream of the p38 MAPK PMK-1 to promote pathogen resistance.

      I am also confused why their RNA-seq experiment revealed a signature of intracellular pathogen response genes and not PMK-1 targets, which the authors propose is accounting for toxic immune activation. Activation of which immune response leads to toxicity?

      We consistently observe that sek-1(km4) mutants are more sensitive to P. aeruginosa infection than pmk-1(km25) mutants, a finding also reported in previous studies (for example, PMID: 33658510). Given that SEK-1 functions upstream of PMK-1 in the MAPK signaling cascade, it is plausible that SEK-1 also regulates additional MAP kinases, such as PMK-2 (PMID: 25671546), which could contribute to the enhanced susceptibility observed in sek-1 mutants.

      Our results show that inhibition of the UFD-1-NPL-4 complex improves survival specifically in severely immunocompromised animals, such as sek-1(km4) mutants, but not in pmk1(km25) mutants. To further validate this, we generated the double mutant dbl-1(nk3);pmk1(km25), which exhibits reduced survival on P. aeruginosa compared to either single mutant.

      Notably, inhibition of the UFD-1-NPL-4 complex also enhances survival in the dbl1(nk3);pmk-1(km25) background, reinforcing the observation that this response is specific to severely compromised immune states.

      We would also like to clarify that the observed phenotypes are independent of the SEK1/PMK-1 pathway, as shown in Figure 3A-3C, Figure 3—figure supplement 1, and Figure 4A-4C. The IPR seems to play a role in the observed phenotypes, as inhibition of some of the protease and pals genes (IPR genes) leads to increased P. aeruginosa colonization in ufd-1 knockdown animals (Figure 6—figure supplement 1). The other immune response pathway that leads to the observed phenotypes is ELT-2, as explained in Figure 6. Finally, we have included in the revised manuscript a note that, in addition, as-yet unidentified pathways are also likely contributing to the phenotypes triggered by disruption of the UFD-1-NPL-4 complex.

      (5) The authors did not test alternative explanations for why UFD-1/ NPL-4 complex inhibition compromises survival during pathogen infection, other than exuberant immune activation. For example, it is possible that inhibition of this proteosome complex shortens lifespan by compromising the general health/ normal physiology of nematodes. Immune responses could be activated as a secondary consequence of this stress, and not be a direct cause of early morality. Does sek-1(km4) mutant suppress the lifespan shortened lifespan of ufd-1 and npl-4 knockdown? This experiment should also be done with loss-offunction mutants, as noted in point 3.

      We have already included this data in Figure 4D, where we observed that ufd-1 and npl-4 knockdown reduce the lifespan of sek-1(km4) animals. It is possible that immune activation is a secondary consequence of cellular stress induced by inhibition of the UFD-1NPL-4 complex. However, our data strongly suggest that the observed phenotypes, including reduced gut pathogen load and decreased survival on the pathogen, are due to the aberrant immune response activated by the inhibition of the UFD-1-NPL-4 complex. Evidence from sek-1(km4) mutants particularly underscores the role of this dysregulated immune activation. While this aberrant immune response is detrimental to wild-type animals under pathogenic conditions, it appears to be beneficial in severely immunocompromised backgrounds. Specifically, in sek-1(km4) mutants, inhibition of the UFD-1-NPL-4 complex enhances survival during P. aeruginosa infection (Figure 4A). However, under non-infectious conditions, where sek-1(km4) mutants exhibit a normal lifespan, the same immune activation becomes harmful (Figure 4D). Together, these findings demonstrate that the aberrant immune response induced by UFD-1–NPL-4 inhibition is context-dependent: it is advantageous only for immunocompromised animals under infection, but deleterious to healthy animals under infection and to both healthy and immunocompromised animals under non-infectious conditions.

      (6) The conclusion of Figure 6 hinges on an experiments that uses double RNAi to knockdown two genes at the same time (Fig. 6D and 6G), an approach that is inherently fraught in C. elegans biology owing the likelihood that the efficiency of RNAi-mediated gene knockdown is compromised and may account for the observed phenotypes. The proper control for double RNAi is not empty vector + ufd-1(RNAi), but rather gfp(RNAi) + ufd1(RNAi), as the introduction of a second hairpin RNA is what may compromise knockdown efficiency. In this context, it is important to confirm that knockdown of both genes occurs as expected (with qRT-PCR) and to confirm this phenotype using available elt-2 loss-of-function mutants.

      We thank the reviewer for this helpful suggestion. We have repeated all double

      RNAi experiments using gfp RNAi as a control instead of the empty vector (Figure 6 and Figure 6—figure supplement 1). Additionally, we assessed the efficiency of gene knockdown in the double RNAi conditions (Figure 6—figure supplement 2) and found that RNAi efficacy was not compromised by the double RNAi treatment.

      (7) A supplementary table with the source data for at least three replications (mean lifespan, n, statistical comparison) for each pathogenesis assay should be included in this manuscript.

      The source data is provided for all the data presented in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to uncover what role, if any, the UFD1/NPL4 complex might play in the innate immune responses of the nematode C. elegans. The authors find that loss of the complex renders animals more sensitive to both pathogenic and non-pathogenic bacteria. However, there appears to be a complex interplay with known innate immune pathways since the loss of UFD1/NPL4 actually results in increased survival of animals lacking the canonical innate immune pathways.

      We thank the reviewer for providing an excellent summary of our work.

      Strengths:

      The authors perform robust genetic analysis to exclude and include possible mechanisms by which the UFD1/NPL4 pathway acts in the innate immune response.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      The argument that the loss of the UFD1/NPL4 complex triggers a response that mimics that of an intracellular pathogen has not been thoroughly investigated. Additionally, the finding of a role of the GATA transcription factor, ELT-2, in this response is suggestive, but experiments showing sufficiency in the context of loss of the UFD1/NPL4 complex need to be explored.

      We have investigated the role of IPR genes in the phenotypes observed upon ufd1 knockdown (Figure 6—figure supplement 1), and our results suggest that the IPR may contribute, at least in part, to the phenotypic outcomes of ufd-1 RNAi. In the Discussion section (pages 11–12, lines 345–356), we have included a detailed discussion on the possible mechanisms underlying IPR activation upon inhibition of the UFD-1–NPL-4 complex. We agree that the interaction between the UFD-1–NPL-4 complex and the IPR is intriguing and warrants further investigation. However, we believe that an in-depth exploration of this interaction lies beyond the scope of the current study.

      We have incorporated new data on ELT-2 overexpression in the revised manuscript. Overexpression of ELT-2 partially phenocopies the effects of ufd-1 knockdown, supporting the idea that other pathways likely contribute to the full spectrum of phenotypes observed upon UFD-1-NPL-4 complex inhibition. The revised manuscript reads (page 10, lines 311319): “To determine whether ELT-2 activation alone is sufficient to recapitulate the phenotypes observed upon UFD-1-NPL-4 complex inhibition, we analyzed animals overexpressing ELT-2. Similar to ufd-1 knockdown, ELT-2 overexpression led to a significant reduction in the colonization of the gut by P. aeruginosa (Figure 6—figure supplement 3A and 3B). However, overexpression of ELT-2 did not alter the survival of worms on P. aeruginosa (Figure 6—figure supplement 3C). Taken together, these findings suggest that the phenotypes triggered by disruption of the UFD-1-NPL-4 complex are partially mediated by ELT-2. However, additional pathways, yet to be identified, likely cooperate with ELT-2 to regulate both pathogen resistance and host survival.”

      Reviewer #1 (Recommendations For The Authors):

      The authors could consider avoiding the use of descriptors (e.g., "drastic") when presenting their data.

      We have removed the descriptors.

      Reviewer #2 (Recommendations For The Authors):

      What happens with overexpression of ELT2?

      Overexpression of ELT-2 partially recapitulates the phenotypes of ufd-1 knockdowns, indicating that additional pathways are likely involved in controlling the phenotypes observed upon inhibition of the UFD-1-NPL-4 complex. The revised manuscript reads (page 10, lines 311-319): “To determine whether ELT-2 activation alone is sufficient to recapitulate the phenotypes observed upon UFD-1-NPL-4 complex inhibition, we analyzed animals overexpressing ELT-2. Similar to ufd-1 knockdown, ELT-2 overexpression led to a significant reduction in the colonization of the gut by P. aeruginosa (Figure 6—figure supplement 3A and 3B). However, overexpression of ELT-2 did not alter the survival of worms on P. aeruginosa (Figure 6—figure supplement 3C). Taken together, these findings suggest that the phenotypes triggered by disruption of the UFD-1-NPL-4 complex are partially mediated by ELT-2. However, additional pathways, yet to be identified, likely cooperate with ELT-2 to regulate both pathogen resistance and host survival.”

      The data with xbp-1 loss of function is very different than that of pek1 and atf-6. Does loss of ufd1/npl4 suppress the increased pathogen survival of xbp-1s overexpressing animals?

      We have examined worms overexpressing XBP-1s and found that overexpression of XBP-1s does not rescue the phenotypes caused by ufd-1 knockdown. The revised manuscript reads (page 6, lines 167-174): “To further examine the role of XBP-1 in this context, we assessed the effect of ufd-1 knockdown in animals neuronally overexpressing the constitutively active spliced form of XBP-1 (XBP-1s), which has been previously associated with enhanced longevity (Taylor and Dillin, 2013). Knockdown of ufd-1 resulted in the reduced survival of XBP-1s-overexpressing animals on P. aeruginosa, despite a concurrent decrease in bacterial colonization of the gut (Figure 2—figure supplement 1A-C). This indicated that the XBP-1 pathway was not required for the reduced P. aeruginosa colonization of ufd-1 knockdown animals.” 

      Lastly, while the pathogen burden is reduced in ufd1/npl4 loss and pumping rates are marginally affected, have you checked defecation rates? Could they be increased?

      We thank the reviewer for this valuable suggestion. We measured defecation rates following ufd-1 and npl-4 knockdown and, unexpectedly, found that inhibition of ufd-1/npl-4 leads to a reduction in defecation frequency. These findings clearly indicate that altered defecation cannot explain the observed decrease in gut colonization. The revised manuscript reads (page 5, lines 138-148): “The clearance of intestinal contents through the defecation motor program (DMP) is known to influence gut colonization by P. aeruginosa in C. elegans (Das et al., 2023). It is therefore conceivable that knockdown of the UFD-1-NPL-4 complex might increase defecation frequency, thereby promoting the physical expulsion of bacteria and resulting in reduced gut colonization. To test this possibility, we measured DMP rates in animals subjected to ufd-1 and npl-4 RNAi. Contrary to this hypothesis, both ufd-1 and npl-4 knockdown animals exhibited a significant reduction in defecation frequency compared to control RNAi-treated animals (Figure 1—figure supplement 2C). This reduction in DMP rate persisted even after 12 hours of exposure to P. aeruginosa (Figure 1—figure supplement 2D). Thus, the change in the DMP rate in ufd-1 and npl-4 knockdown animals is unlikely to be the reason for the reduced gut colonization by P. aeruginosa.”

      In summary, we would like to thank the reviewers again for providing constructive and thoughtful feedback. We believe we have fully addressed all the concerns of the reviewers by carrying out several new experiments and modifying the text. The manuscript has undergone substantial revision and has thereby improved significantly. We do hope that the evidence in support of the conclusions is found to be complete in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling study identifying RBMX2 as a novel host factor upregulated during Mycobacterium bovis infection.

      The study demonstrates that RBMX2 plays a role in:

      (1) Facilitating M. bovis adhesion, invasion, and survival in epithelial cells.

      (2) Disrupting tight junctions and promoting EMT.

      (3) Contributing to inflammatory responses and possibly predisposing infected tissue to lung cancer development.

      By using a combination of CRISPR-Cas9 library screening, multi-omics, coculture models, and bioinformatics, the authors establish a detailed mechanistic link between M. bovis infection and cancer-related EMT through the p65/MMP-9 signaling axis. Identification of RBMX2 as a bridge between TB infection and EMT is novel.

      Strengths:

      This topic and data are both novel and significant, expanding the understanding of transcriptomic diversity beyond RBM2 in M. bovis responsive functions.

      Weaknesses:

      (1) The abstract and introduction sometimes suggest RBMX2 has protective anti-TB functions, yet results show it facilitates pathogen adhesion and survival. The authors need to rephrase claims to avoid contradiction.

      We sincerely appreciate the reviewer's valuable feedback regarding the need to clarify RBMX2's role throughout the manuscript. We have carefully revised the text to ensure consistent messaging about RBMX2's function in promoting M. bovis infection. Below we detail the specific modifications made:\

      (1) Introduction Revisions:

      Changed "The objective of this study was to elucidate the correlation between host genes and the susceptibility of M.bovis infection" to "The objective of this study was to identify host factors that promote susceptibility to M.bovis infection"

      Revised "RBMX2 polyclonal and monoclonal cell lines exhibited favorable phenotypes" to "RBMX2 knockout cell lines showed reduced bacterial survival"

      Replaced "The immune regulatory mechanism of RBMX2" with "The role of RBMX2 in facilitating M.bovis immune evasion"

      (2) Results Revisions:

      Modified "RBMX2 fails to affect cell morphology and the ability to proliferate and promotes M.bovis infection" to "RBMX2 does not alter cell viability but significantly enhances M.bovis infection"

      Strengthened conclusion in Figure 4: "RBMX2 actively disrupts tight junctions to facilitate bacterial invasion"

      (3) Discussion Revisions:

      Revised screening description: "We screened host factors affecting M.bovis susceptibility and identified RBMX2 as a key promoter of infection"

      Strengthened concluding statement: "In summary, RBMX2 drives TB pathogenesis by compromising epithelial barriers and inducing EMT"

      These targeted revisions ensure that:

      All sections consistently present RBMX2 as promoting infection; the language aligns with our experimental finding; potential protective interpretations have been eliminated. We believe these modifications have successfully addressed the reviewer's concern while maintaining the manuscript's original structure and scientific content. We appreciate the opportunity to improve our manuscript and thank the reviewer for this constructive suggestion.

      (2) While p65/MMP-9 is convincingly implicated, the role of MAPK/p38 and JNK is less clearly resolved.

      We sincerely appreciate the reviewer's insightful comment regarding the roles of MAPK/p38 and JNK in our study. Our experimental data clearly demonstrated that RBMX2 knockout significantly reduced phosphorylation levels of p65, p38, and JNK (Fig. 5A), indicating potential involvement of all three pathways in RBMX2-mediated regulation.

      Through systematic functional validation, we obtained several important findings:

      In pathway inhibition experiments, p65 activation (PMA treatment) showed the most dramatic effects on both tight junction disruption (ZO-1, OCLN reduction) and EMT marker regulation (E-cadherin downregulation, N-cadherin upregulation);p38 activation (ML141 treatment) exhibited moderate effects on these processes; JNK activation (Anisomycin treatment) displayed minimal impact.

      Most conclusively, siRNA-mediated silencing of p65 alone was sufficient to:

      Restore epithelial barrier function

      Reverse EMT marker expression

      Reduce bacterial adhesion and invasion

      These results establish a clear hierarchy in pathway importance: p65 serves as the primary mediator of RBMX2's effects, while p38 plays a secondary role and JNK appears non-essential under our experimental conditions. We have now clarified this relationship in the revised Discussion section to strengthen this conclusion.

      This refined understanding of pathway hierarchy provides important mechanistic insights while maintaining consistency with all our experimental data. We thank the reviewer for this valuable suggestion that helped improve our manuscript.

      (3) Metabolomics results are interesting but not integrated deeply into the main EMT narrative.

      Thank you for this constructive suggestion. In this article, we detected the metabolome of RBMX2 knockout and wild-type cells after Mycobacterium bovis infection, which mainly served as supporting evidence for our EMT model. However, we did not conduct an in-depth discussion of these findings. We have now added a detailed discussion of this section to further support our EMT model.

      ADD:Meanwhile, metabolic pathways enriched after RBMX2 deletion, such as nucleotide metabolism, nucleotide sugar synthesis, and pentose interconversion, primarily support cell proliferation and migration during EMT by providing energy precursors, regulating glycosylation modifications, and maintaining redox balance; cofactor synthesis and amino sugar metabolism participate in EMT regulation through influencing metabolic remodeling and extracellular matrix interactions; chemokine and cGMP-PKG signaling pathways may further mediate inflammatory responses and cytoskeletal rearrangements, collectively promoting the EMT process.

      (4) A key finding and starting point of this study is the upregulation of RBMX2 upon M. bovis infection. However, the authors have only assessed RBMX2 expression at the mRNA level following infection with M. bovis and BCG. To strengthen this conclusion, it is essential to validate RBMX2 expression at the protein level through techniques such as Western blotting or immunofluorescence. This would significantly enhance the credibility and impact of the study's foundational observation.

      Thank you for your comment. We have supplemented the experiments in this part and found that Mycobacterium bovis infection can significantly enhance the expression level of RBMX2 protein.

      (5) The manuscript would benefit from a more in-depth discussion of the relationship between tuberculosis (TB) and lung cancer. While the study provides experimental evidence suggesting a link via EMT induction, integrating current literature on the epidemiological and mechanistic connections between chronic TB infection and lung tumorigenesis would provide important context and reinforce the translational relevance of the findings.

      We sincerely appreciate the valuable comments from the reviewer. We fully agree with your suggestion to further explore the relationship between tuberculosis (TB) and lung cancer. In the revised manuscript, we will add a new paragraph in the Discussion section to systematically integrate the current literature on the epidemiological and mechanistic links between chronic tuberculosis infection and lung cancer development, including the potential bridging roles of chronic inflammation, tissue damage repair, immune microenvironment remodeling, and the epithelial-mesenchymal transition (EMT) pathway. This addition will help more comprehensively interpret the clinical implications of the observed EMT activation in the context of our study, thereby enhancing the biological plausibility and clinical translational value of our findings.

      ADD:There is growing epidemiological evidence suggesting that chronic TB infection represents a potential risk factor for the development of lung cancer. Studies have shown that individuals with a history of TB exhibit a significantly increased risk of lung cancer, particularly in areas of the lung with pre-existing fibrotic scars, indicating that chronic inflammation, tissue repair, and immune microenvironment remodeling may collectively contribute to malignant transformation 74. Moreover, EMT not only endows epithelial cells with mesenchymal features that enhance migratory and invasive capacity but is also associated with the acquisition of cancer stem cell-like properties and therapeutic resistance 75. Therefore, EMT may serve as a crucial molecular link connecting chronic TB infection with the malignant transformation of lung epithelial cells, warranting further investigation in the intersection of infection and tumorigenesis.

      Reviewer #2 (Public review):

      Summary:

      I am not familiar with cancer biology, so my review mainly focuses on the infection part of the manuscript. Wang et al identified an RNA-binding protein RBMX2 that links the Mycobacterium bovis infection to the epithelial-Mesenchymal transition and lung cancer progression. Upon mycobacterium infection, the expression of RBMX2 was moderately increased in multiple bovine and human cell lines, as well as bovine lung and liver tissues. Using global approaches, including RNA-seq and proteomics, the authors identified differential gene expression caused by the RBMX2 knockout during M. bovis infection. Knockout of RBMX2 led to significant upregulations of tight-junction related genes such as CLDN-5, OCLN, ZO-1, whereas M. bovis infection affects the integrity of epithelial cell tight junctions and inflammatory responses. This study establishes that RBMX2 is an important host factor that modulates the infection process of M. bovis.

      Strengths:

      (1) This study tested multiple types of bovine and human cells, including macrophages, epithelial cells, and clinical tissues at multiple timepoints, and firmly confirmed the induced expression of RBMX2 upon M. bovis infection.

      (2) The authors have generated the monoclonal RBMX2 knockout cell lines and comprehensively characterized the RBMX2-dependent gene expression changes using a combination of global omics approaches. The study has validated the impact of RBMX2 knockout on the tight-junction pathway and on the M. bovis infection, establishing RBMX2 as a crucial host factor.

      Weaknesses:

      (1) The RBMX2 was only moderately induced (less than 2-fold) upon M. bovis infection, arguing its contribution may be small. Its value as a therapeutic target is not justified. How RBMX2 was activated by M. bovis infection was unclear.

      Thank you for your valuable and constructive comments. In this study, we primarily utilized the CRISPR whole-genome screening approach to identify key factors involved in bovine tuberculosis infection. Through four rounds of screening using a whole-genome knockout cell line of bovine lung epithelial cells infected with Mycobacterium bovis, we identified RBMX2 as a critical factor.

      Although the transcriptional level change of RBMX2 was less than two-fold, following the suggestion of Reviewer 1, we examined its expression at the protein level, where the change was more pronounced, and we have added these results to the manuscript.

      Regarding the mechanism by which RBMX2 is activated upon M. bovis infection, we previously screened for interacting proteins using a Mycobacterium tuberculosis secreted and membrane protein library, but unfortunately, we did not identify any direct interacting proteins from M. tuberculosis (https://doi.org/10.1093/nar/gkx1173).

      (2) Although multiple time points have been included in the study, most analyses lack temporal resolution. It is difficult to appreciate the impact/consequence of M. bovis infection on the analyzed pathways and processes.

      We appreciate the valuable comments from the reviewers. Although our study included multiple time points post-infection, in our experimental design we focused on different biological processes and phenotypes at distinct time points:

      During the early phase (e.g., 2 hours post-infection), we focused on barrier phenotypes during the intermediate phase (e.g., 24 hours post-infection), we concentrated more on pathway activation and EMT phenotypes;

      And during the later phase (e.g., 48–72 hours post-infection), we focused more on cell death phenotypes, which were validated in another FII article (https://doi.org/10.3389/fimmu.2024.1431207).

      We also examined the impact of varying infection durations on RBMX2 knockout EBL cellular lines via GO analysis. At 0 hpi, genes were primarily related to the pathways of cell junctions, extracellular regions, and cell junction organization. At 24 hpi, genes were mainly associated with pathways of the basement membrane, cell adhesion, integrin binding and cell migration By 48 hpi, genes were annotated into epithelial cell differentiation and were negatively regulated during epithelial cell proliferation. This indicated that RBMX2 can regulate cellular connectivity throughout the stages of M. bovis infection.

      For KEGG analysis, genes linked to the MAPK signaling pathway, chemical carcinogen-DNA adducts, and chemical carcinogen-receptor activation were observed at 0 hpi. At 24 hpi, significant enrichment was found in the ECM-receptor interaction, PI3K-Akt signaling pathway, and focal adhesion. Upon enrichment analysis at 48 hpi, significant enrichment was noted in the TGF-beta signaling pathway, transcriptional misregulation in cancer, microRNAs in cancer, small cell lung cancer, and p53 signaling pathway.

      Reviewer #3 (Public review):

      Summary:

      This study investigates the role of the host protein RBMX2 in regulating the response to Mycobacterium bovis infection and its connection to epithelial-mesenchymal transition (EMT), a key pathway in cancer progression. Using bovine and human cell models, the authors have wisely shown that RBMX2 expression is upregulated following M. bovis infection and promotes bacterial adhesion, invasion, and survival by disrupting epithelial tight junctions via the p65/MMP-9 signaling pathway. They also demonstrate that RBMX2 facilitates EMT and is overexpressed in human lung cancers, suggesting a potential link between chronic infection and tumor progression. The study highlights RBMX2 as a novel host factor that could serve as a therapeutic target for both TB pathogenesis and infection-related cancer risk.

      Strengths:

      The major strengths lie in its multi-omics integration (transcriptomics, proteomics, metabolomics) to map RBMX2's impact on host pathways, combined with rigorous functional assays (knockout/knockdown, adhesion/invasion, barrier tests) that establish causality through the p65/MMP-9 axis. Validation across bovine and human cell models and in clinical tissue samples enhances translational relevance. Finally, identifying RBMX2 as a novel regulator linking mycobacterial infection to EMT and cancer progression opens exciting therapeutic avenues.

      Weaknesses:

      Although it's a solid study, there are a few weaknesses noted below.

      (1) In the transcriptomics analysis, the authors performed (GO/KEGG) to explore biological functions. Did they perform the search locally or globally? If the search was performed with a global reference, then I would recommend doing a local search. That would give more relevant results. What is the logic behind highlighting some of the enriched pathways (in red), and how are they relevant to the current study?

      We appreciate the reviewer's thoughtful questions regarding our transcriptomic analysis. In this study, we employed a localized enrichment approach focusing specifically on gene expression profiles from our bovine lung epithelial cell system. This cell-type-specific analysis provides more biologically relevant results than global database searches alone.

      Regarding the highlighted pathways, these represent:

      Temporally significant pathways showing strongest enrichment at each stage:

      (1) 0h: Cell junction organization (immediate barrier response)

      (2) 24h: ECM-receptor interaction (early EMT initiation)

      (3) 48h: TGF-β signaling (chronic remodeling)

      Mechanistically linked to our core findings about RBMX2's role in:

      (1) Epithelial barrier disruption

      (2) Mesenchymal transition

      (3) Chronic infection outcomes

      We selected these particular pathways because they:

      (1) Showed the most statistically significant changes (FDR <0.001)

      (2) Formed a coherent biological narrative across infection stages

      (3) Were independently validated in our functional assays

      This targeted approach allows us to focus on the most infection-relevant pathways while maintaining statistical rigor.

      (2) While the authors show that RBMX2 expression correlates with EMT-related gene expression and barrier dysfunction, the evidence for direct association remains limited in this study. How does RBMX2 activate p65? Does it bind directly to p65 or modulate any upstream kinases? Could ChIP-seq or CLIP-seq provide further evidence for direct RNA or DNA targets of RBMX2 that drive EMT or NF-κB signaling?

      We sincerely appreciate the reviewer's in-depth questions regarding the mechanisms by which RBMX2 activates p65 and its association with EMT. Although the molecular mechanism remains to be fully elucidated, our study has provided experimental evidence supporting a direct regulatory relationship between RBMX2 and the p65 subunit of the NF-κB pathway. Specifically, we investigated whether the transcription factor p65 could directly bind to the promoter region of RBMX2 using CHIP experiments. The results demonstrated that the transcription factor p65 can physically bind to the RBMX2 region.

      Furthermore, dual-luciferase reporter assays were conducted, showing that p65 significantly enhances the transcriptional activity of the RBMX2 promoter, indicating a direct regulatory effect of RBMX2 on p65 expression.

      These findings support our hypothesis that RBMX2 activates the NF-κB signaling pathway through direct interaction with the p65 protein, thereby participating in the regulation of EMT progression and barrier function.

      In our subsequent work papers, we will also employ experiments such as CLIP to further investigate the specific mechanisms through which RBMX2 exerts its regulatory functions.

      ADD and Revise in Results:

      To thoroughly verify the regulatory mechanism between RBMX2 and p65, we initiated our investigation by conducting an in-depth analysis of the RBMX2 promoter region to identify potential interactions with the transcription factor p65. Initially, we performed molecular docking simulations to predict the binding affinity and interaction patterns between RBMX2 and p65 proteins. These simulations revealed multiple amino acid residues within the RBMX2 protein that formed strong, stable interactions with p65. The docking analysis yielded a high docking score of 1978.643 (Fig. 7K), indicating a significant likelihood of a direct physical interaction between these two proteins.

      To complement the protein-protein interaction analysis, we next investigated whether p65 could directly bind to the promoter region of the RBMX2 gene at the transcriptional level. Using the JASPAR database, a comprehensive resource for transcription factor binding profiles, we queried the RBMX2 promoter sequence for potential p65 binding sites. This analysis identified several putative binding motifs, suggesting that p65 may act as a transcriptional regulator of RBMX2 expression.

      To experimentally validate this transcriptional regulatory relationship, we employed a dual-luciferase reporter assay. We cloned the RBMX2 promoter region containing the predicted p65 binding sites into a luciferase reporter plasmid. This construct was then co-transfected into cultured cells along with a plasmid expressing p65. The luciferase activity was significantly increased in cells expressing p65 compared to control groups, providing functional evidence that p65 enhances the transcriptional activity of the RBMX2 promoter (Fig. 7I).

      Furthermore, to confirm the direct binding of p65 to the RBMX2 promoter in a chromatin context, we performed chromatin immunoprecipitation followed by quantitative PCR (ChIP-qPCR). In this assay, we used specific antibodies against p65 to immunoprecipitate chromatin fragments containing p65-bound DNA. The enriched DNA fragments were then analyzed using primers targeting the RBMX2 promoter region. Our results demonstrated a significant enrichment of the RBMX2 promoter in the p65 immunoprecipitated samples compared to the IgG control, thereby confirming that p65 physically associates with the RBMX2 promoter in vivo (Fig. 7J). Collectively, these findings-ranging from computational docking predictions to transcriptional reporter assays and ChIP validation-provide strong evidence supporting a direct regulatory interaction between p65 and RBMX2. This regulatory mechanism may play a critical role in the biological pathways involving these two molecules, particularly in contexts such as inflammation, immune response, or cellular stress, where p65 (a subunit of NF-κB) is known to be prominently involved.

      (3) The manuscript suggests that RBMX2 enhances adhesion/invasion of several bacterial species (e.g., E. coli, Salmonella), not just M. bovis. This raises questions about the specificity of RBMX2's role in Mycobacterium-specific pathogenesis. Is RBMX2 a general epithelial barrier regulator or does it exhibit preferential effects in mycobacterial infection contexts? How does this generality affect its potential as a TB-specific therapeutic target?

      Thank you for your valuable comments. When we initially designed this experiment, we were interested in whether the RBMX2 knockout cell line could confer effective resistance not only against Mycobacterium bovis but also against Gram-negative and Gram-positive bacteria. Surprisingly, we indeed observed resistance to the invasion of these pathogens, albeit weaker compared to that against Mycobacterium bovis.

      Nevertheless, we believe these findings merit publication in eLife. Moreover, RBMX2 knockout does not affect the phenotype of epithelial barrier disruption under normal conditions; its significant regulatory effect on barrier function is only evident upon infection with Mycobacterium bovis.

      Importantly, during our genome-wide knockout library screening, RBMX2 was not identified in the screening models for Salmonella or Escherichia coli, but was consistently detected across multiple rounds of screening in the Mycobacterium bovis model.

      (4) The quality of the figures is very poor. High-resolution images should be provided.

      Thank you for your feedback; we provided higher-resolution images.

      (5) The methods are not very descriptive, particularly the omics section.

      Thank you for your comments; we have revised the description of the sequencing section.

      (6) The manuscript is too dense, with extensive multi-omics data (transcriptomics, proteomics, metabolomics) but relatively little mechanistic integration. The authors should have focused on the key mechanistic pathways in the figures. Improving the narratives in the Results and Discussion section could help readers follow the logic of the experimental design and conclusions.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

      Reviewer #2 (Recommendations for the authors):

      (1) The first part of the results and the major conclusions largely overlap with the previous paper by the same authors (Frontiers in Immunology, https://doi.org/10.3389/fimmu.2024.1431207). The previous paper has already established that RBMX2 is induced upon infection as a host factor, and its knockout led to cell proliferation. Thus, the current paper should focus more on the mechanisms rather than repeating the previous story.

      We appreciate the reviewer's careful reading and constructive feedback. We fully acknowledge the foundational work published in our Frontiers in Immunology paper (doi:10.3389/fimmu.2024.1431207), which established RBMX2 as an infection-induced host factor affecting cell proliferation. The current study represents a significant mechanistic extension of these initial findings, with the following key advances:

      (1) Novel Mechanistic Insights (Current Study Focus):

      Discovery of the p65/MMP-9 pathway as the central mechanism mediating RBMX2's effects on EMT (Figs. 4-6)

      First demonstration of RBMX2's role in epithelial barrier disruption (Figs. 2-3)

      Identification of temporal regulation patterns during infection progression (Fig. 7)

      (2) Expanded Biological Scope:

      Demonstration of RBMX2's function in both bovine and human cell systems (vs. previous bovine-only data)

      Clinical correlation with TB lesions

      Therapeutic potential assessment through pathway inhibition

      (3) Technical Advancements:

      CRISPR-based mechanistic validation (vs. previous siRNA approach)

      Multi-omics integration (transcriptomics + metabolomics)

      Advanced live-cell imaging

      We have now:

      Removed redundant proliferation data from Results

      Sharpened the Introduction to highlight mechanistic questions

      Added explicit discussion comparing both studies

      The current work provides the first comprehensive mechanistic framework for RBMX2's role in TB pathogenesis, moving substantially beyond the initial observational findings. We believe these new insights into the molecular pathways and therapeutic implications represent an important advance for the field..

      (2) Line 107-110: The CRISPR screening results are not provided. Has it been published, or is it an unpublished dataset? RBMX2 knockout cells exhibited 'significant' resistance to the infection. How significant? Data?

      Thank you for your valuable comments. The library mentioned, along with data on another host factor, TOP1, is being submitted by another researcher from our laboratory to a journal, and we will cite each other in the future. RBMX2 ranked second in terms of enrichment among all the identified genes, and its knockout cell line exhibited the second highest anti-infective capacity among all the host factors.

      (3) Line 152: The RNA-seq analysis has already been performed/reported in the previous Frontiers paper. Therein, 173 genes were found to be differentially expressed. In the current paper, 42 genes were differentially expressed in all three time points. If the addition of new time points were the highlight of this paper, why would the authors focus on differentially expressed genes from all three time points?

      Thank you for your valuable comments.

      In the newly added data, we aimed to investigate the temporal changes during Mycobacterium bovis infection of host cells.

      Previous study (Frontiers): Single 24h timepoint → 173 DEGs

      Current study: Three timepoints (0h, 24h, 48h) with 42 consistently regulated genes → Reveals temporally stable core regulators of infection response

      On one hand, we briefly described in the manuscript those important genes that exhibited changes across all time points.

      On the other hand, in the supplementary materials, we also focused on the enriched genes at each individual time point, to better understand the temporal dynamics regulated by RBMX2.

      (4) Line 153: The '0 h' time point is in fact 2 h post-infection. Why did the authors skip the real 0h time point? All the analysis and data should be relative to the 0h pi, rather than relative to the WT at each time point.

      We appreciate the reviewer's important question regarding our timepoint nomenclature. The experimental timeline was designed as follows:

      (1) Infection Protocol:

      2h to 0h: Bacterial co-culture (MOI 20:1)

      0h: Gentamicin (100 μg/ml) added to kill extracellular bacteria

      0h+: Monitored intracellular survival

      (2) Rationale for "0h" Designation:

      This marks the onset of intracellular infection phase when Extracellular bacteria are eliminated (validated by plating)Host cell responses to intracellular pathogens begin All subsequent measurements reflect genuine infection (not attachment)

      (3)Technical Validation:

      Confirmed complete extracellular killing by:

      Culture supernatant plating (0 CFU after gentamycin)

      Microscopy ( no surface-associated bacteria)

      (4) Comparative Analysis:

      All data are presented as:

      Fold-change relative to uninfected controls at each timepoint

      We have now:

      Clarified the timeline in Methods

      Specified "0h = post-gentamicin" in all figure legends

      This standardized approach aligns with established intracellular pathogen studies (e.g., Cell Microbiol. 2018;20:e12840). We're happy to adjust terminology if "0hpi (post-invasion)" would be clearer.

      (5) Figure 2F: The data should be compared to the 0h pi, and show the temporal changes of gene expression.

      Thank you for your suggestion. We have added additional information to this section. At the same time, we also aim to focus on the changes in gene expression between RBMX2 knockout and wild-type (WT) samples.

      We have now:

      Added temporal expression profiles relative to 0hpi baseline (SFig.4C).

      Clarified the dual normalization approach in Methods

      Maintained original between-group comparisons for phenotypic correlation

      (6) Line 207. Not all the proteins were down-regulated post-infection.

      Thank you for your comment. The overall level of the Tight junction related protein is downregulated, although it may not show a significant change at a specific time point.

      We have revised our description, changing the keyword from "All" to "Most."

      (7) Line 278, the introduction of the H1299 cell line should appear earlier when it was mentioned for the first time in the manuscript.

      Thank you for your comment. We have provided a description in the abstract and Result1.

      ADD:

      Abstrat: Meanwhile, we also validated the EMT process in human lung epithelial cancer cells H1299.

      Result 1: Furthermore, RBMX2-silenced H1299 cells exhibited a higher survival rate compared to H1299 ShNc cells after M. bovis infection (Fig. 1H).

      (8) Figure 4 is huge and almost illegible, which may be divided into two figures.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

      Reviewer #3 (Recommendations for the authors):

      I encountered frequent grammatical and syntactic issues. Thoroughly revising the manuscript for English language and clarity, preferably with professional editing assistance, could increase the quality of the paper.

      Thank you for your valuable comments; we will invite a professional editor to polish the language.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the manuscript, Aldridge and colleagues investigate the role of IL-27 in regulating hematopoiesis during T. gondii infection. Using loss-of-function approaches, reporter mice, and the generation of serial chimeric mice, they elegantly demonstrate that IL-27 induction plays a critical role in modulating bone marrow myelopoiesis and monocyte generation to the infection site. The study is well-designed, with clear experimental approaches that effectively adddress the mechanisms by which IL-27 regulates bone marrow myelopoiesis and prevents HSC exhaustion.

      Reviewer #2 (Public review):

      Summary:

      Aldridge et al. aim to demonstrate the role of IL27 in limiting emergency myelopoiesis in response to Toxoplasma gondii infection by acting directly at the level of early haematopoietic progenitors.

      They used different mouse genetic models, such as HSC lineage tracing, IL27 and IL27R-deficient mice, to show that:

      (1) HSCs actively participate in emergency myelopoiesis during Toxoplasma gondii infection.

      (2) The absence of IL27 and IL27R increases monocyte progenitors and monocytes, mainly inflammatory monocytes CCR2hi.

      (3) At steady state, loss of IL27 impairs HSC fitness as competitive transplantation shows long-term engraftment deficiency of IL27 BM cells. This impairment is exacerbated after infection.

      (4) IL27 is produced by various BM and other tissue cells at steady state, and its expression increases with infection, mainly by increasing the number of monocytes producing it.

      Although it is indisputable that IL27 has a role in emergency myelopoiesis by limiting the number of proinflammatory monocytes in response to infection, the authors' claim that it acts only on HSCs and not on more committed progenitors (CMP, GMP, MP) is not supported by the quality of the data presented here, as described below in the weakness section. In addition, this study highlights a role for IL27 during infection, but does not focus on trained immunity, which is the focus of the targeted elife issue.

      We thank the reviewer for these comments. We did try (and perhaps failed) to highlight that all cells within the HSPC category, which includes HSCs and MPPs, have the potential to contribute. The lack of IRGM1-RFP reporter expression in CMPs (Supp Fig5C) suggests that only HSCs and MPPs are progenitors that respond to IL-27 within the bone marrow, and thus that IL-27 signaling on these contributes to the effects observed on monopoiesis and peripheral monocyte populations. We have emphasized this in the revised manuscript, particularly in the introduction (line 82) and discussion (lines 469-472). While this manuscript does not focus solely on trained immunity, the impacts of infection regulating HSC differentiation and having a long-term impact on this compartment are a central theme of trained immunity. For example, Figure 6 and the supporting supplemental figures almost exclusively focus on the differentiation potential that is programed into LTHSCs by infection and the role of IL-27 in regulating this programing. Additionally, Figure 7 shows the long-term consequences of such training. The introduction      and discussion have been modified  to emphasize these connections to trained immunity.         

      Weakness

      (1) In Figure 4, MFI quantification is required. This figure also shows the expression level (FACS and RNA) in progenitors (GMP and CMP, GP, MP), which is quite similar to that of HSC at this level, so it is really surprising that CMP does not respond at all to IL27 (S5C).

      As requested, we have included the MFIs, calculated as a fold change over control FMOs, in the revised manuscript. While HSPCs and CMPs show relatively similar RNA expression of Il27ra (Supp. Fig. 5 A), the levels of surface IL-27R expression by CMPs is lower than HSPCs (Fig. 4C, revised). Additional downstream progenitors (including GMPs) show highly reduced RNA expression and a corresponding low expression of the receptor protein. This is now more apparent with the quantified MFIs (Fig 4-5).

      (2) Total BM was used to test the direct effect of IL27 on HSC. There could be an indirect effect from other more mature BM cells, even if they show lower receptor expression than HSC. This should be done on a different sorted population to prove the direct effect of IL27 on HSC. The authors need to look more closely at some stat-dependent genes or stat itself in different sorted cell populations, not just irgm1. It is also known that Stat is associated with increased HSC proliferation in response to IFN, which is the opposite of what is observed here.

      We thank the reviewer for this question. We have found that the methanol fixation required to detect pSTAT disrupted the ability to stain for HSPCs by flow cytometry. Thus, we used the IRGM1 reporter, which we have found to be a sensitive and high-fidelity reporter of STAT1 activity while preserving epitope markers of HSPCs.

      We agree that the use of bulk bone marrow in the in vitro stimulations could allow for the activation of non-HSPC cell types that are IL-27R+. This is now emphasized in the text. However, there are advantages to this bulk approach as it allows simultaneous analysis of all HSPC populations and downstream progenitors in the same cultures, allowing the ability to assess how the small numbers of IL-27R expressing lymphocytes present in these cultures respond (data that are now included, Supp. Fig. 5C). These cultures also allow a direct comparison of our IL-27R expression analysis with responsiveness to IL-27. Only a selection of the populations analyzed are shown in these data; however, all populations in Figure 4A were also analyzed in Supp. Fig. 5C. These data sets directly correlate receptor expression with sensitivity to IL-27. If this effect was indirect (i.e the ability of IL-27 to induce IFN-γ) then we would expect more robust expression of the IRGM1 reporter across other cell populations. However, while IFN-γ stimulates broad expression of IRGM1, the effects of IL-27 are restricted to HSPC and mature lymphocytes (Supp. Fig. 5C). In other words, the cells that express the highest levels of the IL-27R are most responsive to IL-27.

      While we do not directly measure HSPC proliferation in these cultures, we agree with the reviewer that the decreased proportions of proliferating HSPCs seen in the absence of IL-27 during infection (Fig. 7A) is a complex data set. The reviewer is also correct that interferons can promote HSC proliferations; however, they can also promote cell stress, DNA damage, and even cell death of HSCs during chronic exposure (reviewed extensively in Demerdash, Y., et al. Exp Hematol. 2021. PMID: 33571568). Thus IFNs, much like IL-27, appear to regulate HSPCs with contextual importance, inducing their proliferation but also death. The activation of STAT1 and STAT3 by IL-27 may be at the core of some of these effects observed in our data, and we point out that IL-10, another activator of STAT1+3, has been shown to limit HSC responses to inflammation (lined 58-62), but we have also presented other possibilities in the discussion.

      (3) The decrease in HSC fitness in IL27R KO at steady state could be an indirect effect of the increase in proinflammatory monocytes contributing to high levels of inflammatory cytokines in the BM and thus chronic HSC activation that is enhanced in response to infection. What is the pro-Inflammatory cytokine profile of the BM of IL27 OR IL27R deficient mice and of mixed chimera mice.

      We thank the reviewer for this insightful comment. This was part of our stated rationale in generating the mixed WT:IL-27R-/- BM chimeras presented in Figure 2. In this mixed setting, there remained differences between the ability of the IL-27R sufficient and deficient stem cells to generate inflammatory macrophages. These results suggest that differences in the inflammatory environment do not account for the differences observed. This conclusion is further supported by the observation that the infection-induced levels of IFN-γ in the bone marrow are equivalent in the presence or absence of IL-27 (now included in the revised manuscript, Supp. Fig. 1F).

      (4) Furthermore, the FACS profile of KI67/brdu of Figure 7 is doubtful, as it is shown in different literature that KSL are not predominantly quiescent as shown here, but about 50% are KI67-. This is also inconsistent with the increase of HSC observed in Figure 1. Quantification of total BruDU+ HSC and other progenitors is also important to quantify all cells that have proliferated during infection. As the repopulation of IL27-deficient BM is also lower in the absence of infection the proliation  of HSC in IL27R KO mice in the absence of infection is also important.

      The comment indicates that the reviewer is concerned that our staining for Ki67 is on the low end of reported literature (~10-50% of LSKs, depending on age of the mice and simulation (Thapa R, et al. Stem Cell Res Ther. 2023. PMID: 37280691; Nies KPH, et al. Cytometry A. 2018. PMID: 30176186)). Our stains were performed on cells from infected mice, which does alter the classic markers used to identify HSPCs. For this reason, we are stringent with our gating strategy and may be excluding more HSPCs than are included in other reports. We have included our FMO control in the revised manuscript to indicate our gating approach (Supp. Fig. 9A). While the population of Ki67+ HSPCs is low, these results were consistent between our experiments and provide data sets that are interpretable.

      (5) The immunofluorescence in Figure 3 shows a high level of background and it is difficult to see the GFP and tomato positive cells. In this sense, the number of HSCs quantified as Procr+ (more than 8000 on a single BM section) is inconsistent with the total number of HSCs that a BM can contain (i.e., around 6000 per BM as quantified in Figure 1).

      We agree with the reviewer and have found that there is a high level of background in these stains. We have thresholded these images, as described in our methods, to minimize this. Additionally, the increased numbers of Procr+ cells in the imaging vs our flow data is expected, and has been reported by others (Steinert, EM, et al. Cell. 2015. PMID: 25957682).

      (6) The addition of arrows to the figure will help to visualise positive cells. It is also not clear why the author normalised the GFP+ cells to the tomato+ cells in Figure 3D.

      We thank the reviewer for this comment and have added the suggested arrows. We have also included a more detailed explanation for our normalization strategy.

      (7) Furthermore, even if monocytes represent a high proportion of IL27-producing cells, they are only 50% of the cells at 5dpi, as shown in Figure 3 and S4. Without other monocyte markers, line 307 is incorrect.

      We thank the reviewer for this clarification and have adjusted the text accordingly.

      (8) How do the authors explain that in Figure 1, 5-10% of labelled precursors and monocytes can give 100% of monocytes? This would mean that only labelled HSC can differentiate into PEC monocytes. 5

      We thank the reviewer for their interest in this result. Monocytes and macrophages are some

      Reviewer #1 (Recommendations for the authors):

      I have two minor comments that could enhance the conceptual framework of this study:

      (1) The authors indirectly show that IL-27R expression on HSPCs is necessary for regulating HSC proliferation and preventing exhaustion. However, given that they have access to IL-27RFlox mice, they could cross these with Fgd5Cre mice to specifically delete IL-27R on long-term HSCs. This would provide direct evidence for the role of IL-27 signaling in LTHSCs during infection.

      We appreciate this comment and did attempt this experiment with several HSPC specific Cres, including the Procr-cre (used elsewhere in the manuscript) and the MDS1-cre-ERT2 (Jackson Laboratory Strain #:032863). Unfortunately, validation revealed that deletion efficiency of the IL-27R with these HSCspecific Cre lines was inefficient, and so experiments are ongoing to enhance efficiency of the deletion and test alternative Cre lines (such as the Fgd5-cre).

      (2) Since memory T and B cells often home to the bone marrow, it would be interesting to consider the potential cross-talk between these cells, HSPCs, and IL-27 signaling during secondary T. gondii infection. A brief discussion of this possibility would strengthen the study's broader implications.

      We thank the reviewer for this opportunity. We have previously investigated the interplay between immune cells in the bone marrow (Glatman Zaretsky A, et al. Cell Rep. 2017. PMID: 28228257) and now include these possibilities in the discussion (line 465-470).

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Figures 6F and 7B: should be shown as % of donor and not total number to clarify the lineage potency of LTHSC. The fact that the results of transplantation are separated into different figures makes it not easy to follow. To see if the increase in monocyte production by IL27 KO BM is specific, the percent of donorderived cells for other populations, such as lymphoid, but also in MP, and inflammatory monocytes, is necessary to confirm Figure 2.

      Perhaps there has been a misunderstanding? In these plots, we are not analyzing mixed chimeras but single transfer chimeras into lethally irradiated hosts. Thus, the % of donor reaches ~80- 90%. However, to measure the actual output of the HSPCs, the cell number was necessary to compare amongst groups. Additional description is provided in the figure legends and in the text of the manuscript (lines 391-392, 434-436, 651-653, and 680-682).

      (2) The heavy UMAP description is unnecessary. Responses As requested, we have reduced this description of how the UMAPs were derived.

      As requested, we have reduced this description of how the UMAPs were derived

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors were attempting to describe whether trained innate immunity would modulate antibody-dependent cellular phagocytosis (ADCP) and/or efferocytosis.

      Strengths:

      The use of primary murine macrophages, and not a cell line, is considered a strength. The trained immunity-mediated changes to phagocytosis affected both melanoma and breast cancer cells. The broad effect is consistent with trained immunity.

      Weaknesses:

      The most significant weakness, also noted by the authors in the discussion, is the lack of in vivo data. Without these data, it is not possible to put the in vitro data in context. It is unknown if the described effects on efferocytosis will be relevant to the in vivo progression of cancer.

      We thank the reviewer for these comments. To examine the role of trained immunity on the modulation of macrophage efferocytosis in vivo, we performed immunostaining analysis in sections from B16F10 tumour samples.

      Importantly, we found that macrophage efferocytosis of apoptotic tumour cells was significantly decreased in the tumour tissue that was excised from mice treated with β-glucan 7 days prior to tumour inoculation (supplementary Figure 3). These data are consistent with our findings using co-culture assays further strengthening the impact of our key findings in this report.

      Reviewer #2 (Public review):

      Summary:

      The authors follow up their preclinical work on beta-glucan-induced trained immunity in murine tumor models that they published in Cell in 2020. In particular, they focus on the role of trained immunity and efferocytosis of cancer cells

      Strengths:

      While properly conducted, the work is underwhelming and fully depends on in vitro observations performed with co-cultures of bone marrow derived macrophages from beta-glucantreated mice and tumor cell lines. From these in vitro studies, the authors conclude that trained immunity induction has no effect on antibody-dependent cellular phagocytosis, while it decreases efferocytosis.

      Weaknesses:

      It would be important to study these phenomena in tumor mouse models in vivo. The authors clearly have the expertise as they have shown in previous studies. Especially because the in vitro observation appears to conflict with the in vivo anti-tumor found in mice prophylactically treated with beta-glucan. Clearly, trained immunity is associated with diverse cellular responses and mechanisms, some of which may promote tumor growth, as the current manuscript suggests, but in the absence of in vivo studies, it is merely a mechanistic exercise of which the relevance is difficult to determine.

      We thank the reviewer for raising this important comment. We have followed reviewer’s suggestion and examined the role of trained immunity on the modulation of macrophage efferocytosis in vivo. As mentioned in our response to Reviewer 1, we demonstrate that efferocytosis of apoptotic melanoma cells in situ was attenuated in tumour samples from ‘trained’ mice as compared to those from controltreated mice.

      Efferocytosis displays a pro-tumour and immunosuppressive role, therefore both our in vitro co-culture (Figure 1) and in vivo (supplementary Figure 3) findings are consistent with our previously published in vivo data supporting the tumour-suppressive role of prophylactic treatment with β-glucan (Kalafati, Kourtzelis et al, PMID: 33125892). 

      Reviewer #3 (Public review):

      Summary:

      Chatzis et al showed that β-glucan trained macrophages have decreased phagocytic activity of apoptotic tumor cells and that is accompanied by lower levels of secreted IL-1β using a mouse model. Strengths: This finding has a potential impact on designing new cancer immunotherapeutic approaches by targeting macrophage efferocytosis.

      Weaknesses:

      Whether this finding could be applied to other scenarios is underdetermined.

      (1)  Does the decrease of efferocytosis also occur in human monocytes/macrophages after training?

      (2)  Both β-glucan and BCG are well-trained innate immunity agents, the authors showed that β-glucan decreased efferocytosis via IL-1 β, so it is interesting to know whether BCG has a similar effect.

      We thank the reviewer for these comments. Our data suggest that induction of trained immunity with β-glucan contributes to decreased macrophage efferocytosis of tumour cells based on co-culture and in vivo approaches in a mouse setting.  

      We agree with the reviewer that utilisation of a human setting would be important to provide additional validation of our findings.

      Induction of trained immunity entails epigenetic and metabolic reprogramming of hematopoietic stem and progenitor cells (HSPCs). As such, the elucidation of mechanisms that modulate trained immunity in human cells would require the establishment of a macrophage differentiation model based on the use of HSPCs rather than the stimulation of monocytes or macrophages with β-glucan.

      Additionally, the investigation of the impact of BCG in trained immunity-dependent phagocytosis would require the assessment of all different types of phagocytic cargos (apoptotic melanoma and breast cancer cells, apoptotic neutrophils, microbial bioparticles) as we did in the case of the β-glucan.  The capacity of different molecules to induce trained immunity in the efferocytosis setting requires further investigation that would be beyond the scope of this study. Therefore, we plan to address these very interesting points in a future study.

      Additional text was added in the Discussion section to clarify the reviewer's points. In addition, we provide a more specific title that reflects better the specificity of our findings.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context. 

      Strengths: 

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.  Response: We appreciate the reviewer’s positive evaluation.

      Weaknesses: 

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted. 

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote: 

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr.Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data do not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions.

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the supression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA,broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself. We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section:

      “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, asindicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in GiDREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells:

      “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gicoupling in suppressing neuronal activity (Shan, Fang et al. 2022).”

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF

      Thank you this has been fixed

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre+ SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions isdisrupted: activating Sepw1-Cre<sup>+ </sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+ </sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+ </sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+ </sup> cells. The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre<sup>+ </sup> projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre<sup>+ </sup> striatopallidal fibers. We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals. These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped

      Thank you. This has been corrected.

      (16) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 1131811323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K.Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic DirectIndirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025).

      "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of GAT3 in the visual system. First, they have developed a CRISPR/Cas9-based approach to locally knock out this transporter in the visual cortex. They then demonstrated electrophysiologically that this manipulation increases inhibitory synaptic input into layer 2/3 pyramidal cells. They further examined the functional consequences by imaging neuronal activity in the visual cortex in vivo. They found that the absence of GAT3 leads to reduced spontaneous neuronal activity and attenuated neuronal responses and reliability to visual stimuli, but without an effect on orientation selectivity. Further analysis of this data suggests that Gat3 removal leads to less coordinated activity between individual neurons and in population activity patterns, thereby impairing information encoding. Overall, this is an elegant and technically advanced study that demonstrates a new and important role of GAT3 in controlling the processing of visual information.

      We are grateful to the reviewer for their positive appraisal of our work, including our technical advances and our demonstration of how cortical astrocytes play a role in visual information processing by neurons via GAT3-mediated regulation of activity.

      Strengths:

      (1)  Development of a new approach for a local knockout (GAT3).

      (2)  Important and novel insights into visual system function and its dependence on GAT3.

      (3)  Plausible cellular mechanism.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for highlighting the strengths of our study, including the development of a novel local knockout strategy for GAT3, the discovery of important functional consequences for visual system processing, and the identification of a plausible underlying cellular mechanism.

      Reviewer #2 (Public review):

      Summary:

      Park et al. have made a tool for spatiotemporally restricted knockout of the astrocytic GABA transporter GAT3, leveraging CRISPR/Cas9 and viral transduction in adult mice, and evaluated the effects of GAT3 on neural encoding of visual stimulation.

      Strengths:

      This concise manuscript leverages state-of-the-art gene CRISPR/Cas9 technology for knocking out astrocytic genes. This has only to a small degree been performed previously in astrocytes, and it represents an important development in the field. Moreover, the authors utilize in vivo two-photon imaging of neural responses to visual stimuli as a readout of neural activity, in addition to validating their data with ex vivo electrophysiology. Lastly, they use advanced statistical modeling to analyze the impact of GAT3 knockout. Overall, the study comes across as rigorous and convincing.

      We appreciate the reviewer’s endorsement of our experimental rigor and methodological innovation. We agree that combining in vivo and ex vivo measurements with rigorous analytical methods strengthens the overall conclusions of the study and demonstrates the important role of astrocytic GAT3 in cortical visual processing.

      Weaknesses:

      Adding the following experiments would potentially have strengthened the conclusions and helped with interpreting the findings:

      (1) Neural activity is quite profoundly influenced by GAT3 knockout. Corroborating these relatively large changes to neural activity with in vivo electrophysiology of some sort as an additional readout would have strengthened the conclusions.

      We agree that further investigation of neuronal activity at higher temporal resolution would provide valuable complementary data, particularly given the profound effects we observed using a pan-neuronal calcium indicator. Detailed in vivo electrophysiology—such as large-scale Neuropixel recordings—would allow assessment of single-neuron spiking dynamics and potentially cell-type specific responses following GAT3 deletion. While such an investigation is beyond the scope of the current study, we concur that it would be an important follow-up direction to further dissect the effects of GAT3 knockout on neuron activity profiles at both single-cell and population levels.

      (2) Given the quite large effects on neural coding in visual cortex assessed på jRGECO imaging, it would have been interesting if the mouse groups could have been subjected to behavioral testing, assessing the visual system.

      We appreciate the reviewer’s suggestion to explore potential behavioral consequences of GAT3 deletion. Based on our observed alterations in visual cortical activity, we agree that GAT3 knockout could impact visual discrimination-based behaviors. Astrocytes in the visual cortex are highly tuned to sensory and motor events and are generally known to shape behavioral outputs (Slezak et al., 2019; Kofuji & Araque, 2021). Our study suggests that regulation of inhibitory signaling via GAT3 transporters is a possible mechanism by which astrocytes influence visually guided behaviors. Although behavioral assessments fall beyond the scope of the current work, we agree with the reviewer’s suggestion and will pursue future experiments employing paradigms such as go/no-go visual detection or two-alternative forced choice to determine whether astrocytic GAT3 modulates visually guided behaviors and perceptual decisionmaking.  

      Reviewer #1 (Recommendations for the authors):

      It could be more clearly stated from the very beginning that a method was developed and used which, by itself, apparently has no cell type selectivity. It is highly plausible that the effects are mostly due to the absence of astrocytic GAT3, as discussed by the authors, but the distinction of what has been done and what is interpretation based on the literature is occasionally a bit blurry. This is also important because there are CRISPR/Cas9-based approaches that are astrocyte-specific (e.g., GEARBOCS).

      We thank the reviewer for this helpful suggestion. As noted, our current approach does not confer celltype specificity on its own. Although our interpretation—supported by expression patterns and prior literature—attributes the observed effects primarily to astrocytic GAT3 loss, we agree that this distinction should be explicitly stated. We have revised the Introduction section (lines 83-87) to clarify that while MRCUTS allows for local gene knockout, it is not inherently cell-type specific unless combined with celltype restricted Cre drivers, as is possible in future applications.

      A change of ambient GABA following GAT3 deletion is central to the proposed cellular mechanism. Demonstrating this directly would strengthen the manuscript (e.g., changed tonic GABAergic current in the absence of GAT3, and insensitivity to SNAP-5114).

      While we recognize that directly quantifying ambient GABA levels would further strengthen our study, substantial evidence supports the role of GABA transporters in coordinately regulating both phasic and tonic inhibition and cellular excitability (Kinney, 2005; Keros & Hablitz, 2005; Semyanov et al. 2003).

      Moreover, tonic GABA currents have been shown to strongly correlate with phasic inhibitory bursts (Glykys & Mody, 2007; Farrant & Nusser, 2005; Ataka & Gu, 2006), suggesting shared underlying regulatory mechanisms. Furthermore, as the reviewer correctly points out, alternative mechanisms such as non-vesicular GABA release or disinhibition via interneuron suppression cannot be excluded (also discussed in Kinney 2005). Given these considerations, we prioritized sIPSC measurements as a more integrative and reliable proxy for altered GABAergic signaling in L2/3 pyramidal neurons. We have revised the Discussion section (lines 329-333) to explain our choice of approach for further clarification.

      We also agree it would be of interest to test whether GAT3 KO neurons exhibit insensitivity to SNAP-5114, both ex vivo and in vivo. However, based on our SNAP-5114 application experiments in vivo, which revealed only subtle effects on single-neuron properties (Figure S2A-F), we anticipate that interpreting a lack of effect in the KO condition would be challenging and potentially inconclusive.  

      References

      Ataka, T. & Gu, J. G. Relationship between tonic inhibitory currents and phasic inhibitory activity in the spinal cord lamina II region of adult mice. Mol. Pain. (2006).  

      Bright, D. & Smart, T. Methods for recording and measuring tonic GABAA receptor-mediated inhibition. Front. Neural Circuits. 7, (2013).

      Farrant, M. & Nusser, Z. Variations on an inhibitory theme: phasic and tonic activation of GABAA receptors. Nat. Rev. Neurosci. 6, 215–229 (2005).  

      Glykys, J. & Mody, I. Activation of GABAA Receptors: Views from Outside the Synaptic Cleft. Neuron. 56, 763-770 (2007).

      Keros, S. & Hablitz, J. J. Subtype-Specific GABA Transporter Antagonists Synergistically Modulate Phasic and Tonic GABAA Conductances in Rat Neocortex. J. Neurophysiol. 94, 2073–2085 (2005).

      Kinney, G. A. GAT-3 Transporters Regulate Inhibition in the Neocortex. J. Neurophysiol. 94, 4533–4537 (2005).

      Kofuji, P. & Araque, A. Astrocytes and Behavior. Annu. Rev. Neurosci. 44, 49–67 (2021).

      Semyanov, A., Walker, M. & Kullmann, D. GABA uptake regulates cortical excitability via cell type–specific tonic inhibition. Nat. Neurosci. 6, 484–490 (2003).

      Slezak, M., Kandler, S., Van Veldhoven, P. P., Van den Haute, C., Bonin, V. & Holt, M.G. Distinct

      Mechanisms for Visual and Motor-Related Astrocyte Responses in Mouse Visual Cortex. Curr. Biol. 18, 3120-3127 (2019).

    1. Author response:

      Reviewer 1 (Public review):

      The manuscript by Yin and colleagues addresses a long-standing question in the field of cortical morphogenesis, regarding factors that determine differential cortical folding across species and individuals with cortical malformations. The authors present work based on a computational model of cortical folding evaluated alongside a physical model that makes use of gel swelling to investigate the role of a two-layer model for cortical morphogenesis. The study assesses these models against empirically derived cortical surfaces based on MRI data from ferret, macaque monkey, and human brains.

      The manuscript is clearly written and presented, and the experimental work (physical gel modeling as well as numerical simulations) and analyses (subsequent morphometric evaluations) are conducted at the highest methodological standards. It constitutes an exemplary use of interdisciplinary approaches for addressing the question of cortical morphogenesis by bringing together well-tuned computational modeling with physical gel models. In addition, the comparative approaches used in this paper establish a foundation for broad-ranging future lines of work that investigate the impact of perturbations or abnormalities during cortical development.

      The cross-species approach taken in this study is a major strength of the work. However, correspondence across the two methodologies did not appear to be equally consistent in predicting brain folding across all three species. The results presented in Figures 4 (and Figures S3 and S4) show broad correspondence in shape index and major sulci landmarks across all three species. Nevertheless, the results presented for the human brain lack the same degree of clear correspondence for the gel model results as observed in the macaque and ferret. While this study clearly establishes a strong foundation for comparative cortical anatomy across species and the impact of perturbations on individual morphogenesis, further work that fine-tunes physical modeling of complex morphologies, such as that of the human cortex, may help to further understand the factors that determine cortical functionalization and pathologies.

      We thank the reviewer for positive opinions and helpful comments. Yes, the physical gel model of the human brain has a lower similarity index with the real brain. There are several reasons.

      First, the highly convoluted human cortex has a few major folds (primary sulci) and a very large number of minor folds associated with secondary or tertiary sulci (on scales of order comparable to the cortical thickness), relative to the ferret and macaque cerebral cortex. In our gel model, the exact shapes, positions, and orientations of these minor folds are stochastic, which makes it hard to have a very high similarity index of the gel models when compared with the brain of a single individual.

      Second, in real human brains, these minor folds evolve dynamically with age and show differences among individuals. In experiments with the gel brain, multiscale folds form and eventually disappear as the swelling progresses through the thickness. Our physical model results are snapshots during this dynamical process, which makes it hard to have a concrete one-to-one correspondence between the instantaneous shapes of the swelling gel and the growing human brain.

      Third, the growth of the brain cortex is inhomogeneous in space and varying with time, whereas, in the gel model, swelling is relatively homogeneous.

      We agree that further systematic work, based on our proposed methods, with more fine-tuned gel geometries and properties, might provide a deeper understanding of the relations between brain geometry, and growth-induced folds and their functionalization and pathologies. Further analysis of cortical pathologies using computational and physical gel models can be found in our companion paper (Choi et al., 2025), also submitted to eLife:

      G. P. T. Choi, C. Liu, S. Yin, G. Sejourn´ e, R. S. Smith, C. A. Walsh, L. Mahadevan, Biophysical basis for´ brain folding and misfolding patterns in ferrets and humans. Preprint, bioRxiv 2025.03.05.641682.

      Reviewer 2 (Public review):

      This manuscript explores the mechanisms underlying cerebral cortical folding using a combination of physical modelling, computational simulations, and geometric morphometrics. The authors extend their prior work on human brain development (Tallinen et al., 2014; 2016) to a comparative framework involving three mammalian species: ferrets (Carnivora), macaques (Old World monkeys), and humans (Hominoidea). By integrating swelling gel experiments with mathematical differential growth models, they simulate sulcification instability and recapitulate key features of brain folding across species. The authors make commendable use of publicly available datasets to construct 3D models of fetal and neonatal brain surfaces: fetal macaque (ref. [26]), newborn ferret (ref. [11]), and fetal human (ref. [22]).

      Using a combination of physical models and numerical simulations, the authors compare the resulting folding morphologies to real brain surfaces using morphometric analysis. Their results show qualitative and quantitative concordance with observed cortical folding patterns, supporting the view that differential tangential growth of the cortex relative to the subcortical substrate is sufficient to account for much of the diversity in cortical folding. This is a very important point in our field, and can be used in the teaching of medical students.

      Brain folding remains a topic of ongoing debate. While some regard it as a critical specialization linked to higher cognitive function, others consider it an epiphenomenon of expansion and constrained geometry. This divergence was evident in discussions during the Strungmann Forum on cortical development (Silver¨ et al., 2019). Though folding abnormalities are reliable indicators of disrupted neurodevelopmental processes (e.g., neurogenesis, migration), their relationship to functional architecture remains unclear. Recent evidence suggests that the absolute number of neurons varies significantly with position-sulcus versus gyrus-with potential implications for local processing capacity (e.g., https://doi.org/10.1002/cne.25626). The field is thus in need of comparative, mechanistic studies like the present one.

      This paper offers an elegant and timely contribution by combining gel-based morphogenesis, numerical modelling, and morphometric analysis to examine cortical folding across species. The experimental design - constructing two-layer PDMS models from 3D MRI data and immersing them in organic solvents to induce differential swelling - is well-established in prior literature. The authors further complement this with a continuum mechanics model simulating folding as a result of differential growth, as well as a comparative analysis of surface morphologies derived from in vivo, in vitro, and in silico brains.

      We thank the reviewer for the very positive comments.

      I offer a few suggestions here for clarification and further exploration:

      Major Comments

      (1)   Choice of Developmental Stages and Initial Conditions

      The authors should provide a clearer justification for the specific developmental stages chosen (e.g., G85 for macaque, GW23 for human). How sensitive are the resulting folding patterns to the initial surface geometry of the gel models? Given that folding is a nonlinear process, early geometric perturbations may propagate into divergent morphologies. Exploring this sensitivity-either through simulations or reference to prior work-would enhance the robustness of the findings.

      The initial geometry is one of the important factors that decides the final folding pattern. The smooth brain in the early developmental stage shows a broad consistency across individuals, and we expect the main folds to form similarly across species and individuals.

      Generally, we choose the initial geometry when the brain cortex is still relatively smooth. For the human, this corresponds approximately to GW23, as the major folds such as the Rolandic fissure (central sulcus), arise during this developmental stage. For the macaque brain, we chose developmental stage G85, primarily because of the availability of the dataset corresponding to this time, which also corresponds to the least folded.

      We expect that large-scale folding patterns are strongly sensitive to the initial geometry but fine-scale features are not. Since our goal is to explain the large-scale features, we expect sensitivity to the initial shape.

      Enclosed are some results from other researchers that are consistent with this idea. Below are some images of simulations from Wang et al. obtained by perturbing the geometry of a sphere to an ellipsoid. We see that the growth-induced folds mostly maintain their width (wavelength), but change their orientations.

      Reference:

      Wang, X., Lefevre, J., Bohi, A., Harrach, M.A., Dinomais, M. and Rousseau, F., 2021. The influence of` biophysical parameters in a biomechanical model of cortical folding patterns. Scientific Reports, 11(1), p.7686.

      Related results from the same group show that slight perturbations of brain geometry, cause these folds also tend to change their orientations but not width/wavelength (Bohi et al., 2019).

      Reference:

      Bohi, A., Wang, X., Harrach, M., Dinomais, M., Rousseau, F. and Lefevre, J., 2019, July. Global per-` turbation of initial geometry in a biomechanical model of cortical morphogenesis. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 442-445). IEEE.

      Finally, a systematic discussion of the role of perturbations on the initial geometries and physical properties can be seen in our work on understanding a different system, gut morphogenesis (Gill et al., 2024).

      We have added the discussion about geometric sensitivity in the section Methods-Numerical Simulations:

      “Small perturbations on initial geometry would affect minor folds, but the main features of major folds, such as orientations, width, and depth, are expected to be conserved across individuals [49, 50]. For simplicity, we do not perturb the fetal brain geometry obtained from datasets.”

      (2) Parameter Space and Breakdown Points

      The numerical model assumes homogeneous growth profiles and simplifies several aspects of cortical mechanics. Parameters such as cortical thickness, modulus ratios, and growth ratios are described in Table II. It would be informative to discuss the range of parameter values for which the model remains valid, and under what conditions the physical and computational models diverge. This would help delineate the boundaries of the current modelling framework and indicate directions for refinement.

      Exploring the valid parameter space is a key problem. We have tested a series of growth parameters and will state them explicitly in our revision. In the current version, we chose the ones that yield a relatively high similarity index to the animal brains. More generally, folding patterns are largely regulated by geometry as well as physical parameters, such as cortical thickness, modulus ratios, growth ratios, and inhomogeneity. In our previous work on a different system, gut morphogenesis, where similar folding patterns are seen, we have explored these features (Gill et al., 2024).

      Reference:

      Gill, H.K., Yin, S., Nerurkar, N.L., Lawlor, J.C., Lee, C., Huycke, T.R., Mahadevan, L. and Tabin, C.J., 2024. Hox gene activity directs physical forces to differentially shape chick small and large intestinal epithelia. Developmental Cell, 59(21), pp.2834-2849.

      (3) Neglected Regional Features: The Occipital Pole of the Macaque

      One conspicuous omission is the lack of attention to the occipital pole of the macaque, which is known to remain smooth even at later gestational stages and has an unusually high neuronal density (2.5× higher than adjacent cortex). This feature is not reproduced in the gel or numerical models, nor is it discussed. Acknowledging this discrepancy-and speculating on possible developmental or mechanical explanationswould add depth to the comparative analysis. The authors may wish to include this as a limitation or a target for future work.

      Yes, we have added that the omission of the Occipital Pole of the macaque is one of our paper’s limitations. Our main aim in this paper is to explore the formation of large-scale folds, so the smooth region is neglected. But future work could include this to make the model more complete.

      The main text has been modified in Methods, 3D model reconstruction, pre-processing:

      “To focus on fold formation, we neglected some smooth regions such as the Occipital Pole of the macaque.”

      (4) Spatio-Temporal Growth Rates and Available Human Data

      The authors note that accurate, species-specific spatio-temporal growth data are lacking, limiting the ability to model inhomogeneous cortical expansion. While this may be true for ferret and macaque, there are high-quality datasets available for human fetal development, now extended through ultrasound imaging (e.g., https://doi.org/10.1038/s41586-023-06630-3). Incorporating or at least referencing such data could improve the fidelity of the human model and expand the applicability of the approach to clinical or pathological scenarios.

      We thank the reviewer for pointing out the very useful datasets that exist for the exploration of inhomogeneous growth driven folding patterns. We have referred to this paper to provide suggestions for further work in exploring the role of growth inhomogeneities.

      We have referred to this high-quality dataset in our main text, Discussion:

      “...the effect of inhomogeneous growth needs to be further investigated by incorporating regional growth of the gray and white matter not only in human brains [29, 31] based on public datasets [45], but also in other species.”

      A few works have tried to incorporate inhomogeneous growth in simulating human brain folding by separating the central sulcus area into several lobes (e.g., lobe parcellation method, Wang, PhD Thesis, 2021). Since our goal in this paper is to explain the large-scale features of folding in a minimal setting, we have kept our model simple and show that it is still capable of capturing the main features of folding in a range of mammalian brains.

      Reference:

      Xiaoyu Wang. Modelisation et caract´ erisation du plissement cortical. Signal and Image Processing. Ecole´ nationale superieure Mines-T´ el´ ecom Atlantique, 2021. English.´ 〈NNT : 2021IMTA0248〉.

      (5) Future Applications: The Inverse Problem and Fossil Brains

      The authors suggest that their morphometric framework could be extended to solve the inverse growth problem-reconstructing fetal geometries from adult brains. This speculative but intriguing direction has implications for evolutionary neuroscience, particularly the interpretation of fossil endocasts. Although beyond the scope of this paper, I encourage the authors to elaborate briefly on how such a framework might be practically implemented and validated.

      For the inverse problem, we could use the following strategies:

      a. Perform systematic simulations using different geometries and physical parameters to obtain the variation in morphologies as a function of parameters.

      b. Using either supervised training or unsupervised training (physics-informed neural networks, PINNs) to learn these characteristic morphologies and classify their dependence on the parameters using neural networks. These can then be trained to determine the possible range of geometrical and physical parameters that yield buckled patterns seen in the systematic simulations.

      c. Reconstruct the 3D surface from fossil endocasts. Using the well-trained neural network, it should be possible to predict the initial shape of the smooth brain cortex, growth profile, and stiffness ratio of the gray and white matter.

      As an example in this direction, supervised neural networks have been used recently to solve the forward problem to predict the buckling pattern of a growing two-layer system (Chavoshnejad et al., 2023). The inverse problem can then be solved using machine-learning methods when the training datasets are the folded shape, which are then used to predict the initial geometry and physical properties.

      Reference:

      Chavoshnejad, P., Chen, L., Yu, X., Hou, J., Filla, N., Zhu, D., Liu, T., Li, G., Razavi, M.J. and Wang, X., 2023. An integrated finite element method and machine learning algorithm for brain morphology prediction. Cerebral Cortex, 33(15), pp.9354-9366.

      Conclusion

      This is a well-executed and creative study that integrates diverse methodologies to address a longstanding question in developmental neurobiology. While a few aspects-such as regional folding peculiarities, sensitivity to initial conditions, and available human data-could be further elaborated, they do not detract from the overall quality and novelty of the work. I enthusiastically support this paper and believe that it will be of broad interest to the neuroscience, biomechanics, and developmental biology communities.

      Note: The paper mentions a companion paper [reference 11] that explores the cellular and anatomical changes in the ferret cortex. I did not have access to this manuscript, but judging from the title, this paper might further strengthen the conclusions.

      The companion paper (Choi et al., 2025) has also been submitted to Elife and can be found on bioXiv here:

      G. P. T. Choi, C. Liu, S. Yin, G. Sejourn´ e, R. S. Smith, C. A. Walsh, L. Mahadevan, Biophysical basis for´ brain folding and misfolding patterns in ferrets and humans. bioRxiv 2025.03.05.641682.

    1. Author response:

      Reviewer 1:

      (1) Line 65 "(Figure 1A). Inactivation causes a change in the leg's rest position; however, in preliminary experiments, the body rotation did not have a large effect on the rest positions of the leg following inactivation. This result is consistent with the one already reported for stick insects and shows that passive forces within the leg are much larger than the gravitational force on a leg and dominate limb position [1]." This is the direct replication of the previous work by Hooper et al 2009 and therefore authors should ideally show the data for this condition (no weight attached).

      We did not present this data – the effect of inactivation on the leg’s rest position in unweighted leg - because it was already reported in the case of stick insects. However, we understand the reviewer’s point that it is important to present the data showing this replication. We will do the same in the revised version.

      (2) The authors use vglut-gal4, a very broad driver for inactivating motor neurons. The driver labels all glutamatergic neurons, including brain descending neurons and nerve cord interneurons, in addition to motor neurons. Additionally, the strength of inactivation might differ in different neurons (including motor neurons) depending on the expression levels of the opsins. As a result, in this condition, the authors might not be removing all active forces. This is a major caveat that authors do not address. They explore that they are not potentially silencing all inputs to muscles by using an additional octopaminergic driver, but this doesn't address the points mentioned above. At the very least, the authors should try using other motor neuron drivers, as well as other neuronal silencers. This driver is so broad that authors couldn't even use it for physiology experiments. Additionally, the authors could silence VGlut-labeled motor neurons and record muscle activity (potentially using GCaMP as has been done in several recent papers cited by the authors, Azevedo et al, 2020) as a much more direct readout.

      This reviewer critique is related to the use of vglut-gal4 –a broad driver– to inactivate motor neurons (MNs). The reviewer argues that the use of a broad driver might result in some effects that are not due to MN inactivation. Conversely, it is possible that not all MNs are inactivated. These critiques raise important points that we will address in the revision by 1) performing experiments with other MN drivers as suggested by the reviewer, 2) performing experiments in flies that are inactivated by freezing. These measurements will provide other estimates of passive forces allowing us to better triangulate the range of values for the passive forces. Moreover, it appears that one of the reviewer’s main concern is that the passive forces are overestimated because of the residual active forces. We will discuss this possibility in detail. It is important to note that in the end what we hope to accomplish is to provide a useful estimate of the passive forces. It is unlikely that the passive force will be a precise number like a physical constant as the passive forces likely depend on recent history.

      (3) Figure 4 uses an extremely simplified OpenSim model that makes several assumptions that are known to be false. For example, the Thorax-Coxa joint is assumed to be a ball and socket joint, which it is not. Tibia-tarsus joint is completely ignored and likely makes a major contribution in supporting overall posture, given the importance of the leg "claw" for adhering to substrates. Moreover, there are a couple of recent open-source neuromechanical models that include all these details (NeuromechFly by Lobato-Rios et al, 2022, Nat. Methods, and the fly body model by Vaxenburg et al, 2025, Nature). Leveraging these models to rule in or rule out contributions at other joints that are ignored in the authors' OpenSim model would be very helpful to make their case.

      Our OpenSim model predates the newer mechanical model. In the revised manuscript, we will revisit the model in light of recent developments.

      (4) Figure 5 shows the experimental validation of Figure 4 simulations; however, it suffers from several caveats.

      a) The authors track a single point on the head of the fly to estimate the height of the fly. This has several issues. Firstly, it is not clear how accurate the tracking would be. Secondly, it is not clear how the fly actually "falls" on VGlut silencing; do all flies fall in a similar manner in every trial? Almost certainly, there will be some "pitch" and "role" in the way the fly falls. These will affect the location of this single-tracked point that doesn't reflect the authors' expectations. Unless the authors track multiple points on the fly and show examples of tracked videos, it is hard to believe this dataset and, hence, any of the resulting interpretations.

      b) As described in the previous point, the "reason" the fly falls on silencing all glutamatergic neurons could be due to silencing all sorts of premotor/interneurons in addition to the silencing of motor neurons.

      c) (line 175) "The first finding is that there was a large variation in the initial height of the fly (Figure 5C), consistent with a recent study of flies walking on a treadmill[20]." The cited paper refers to how height varies during "walking". However, in the current study, the authors are only looking at "standing" (i.e. non-walking) flies. So it is not the correct reference. In my opinion, this could simply reflect poor estimation of the fly's height based on poor tracking or other factors like pitch and role.

      d) "The rate at which the fly fell to the ground was much smaller in the experimental flies than it was in the simulated flies (Figure 5E). The median rate of falling was 1.3 mm/s compared to 37 mm/s for the simulated flies (Figure 5F). (Line 190) The most likely reason for the longer than expected time for the fly to fall is delays associated with motor neuron inactivation and muscle inactivation." I don't believe this reasoning. There are so many caveats (which I described in the above points) in the model and the experiment, that any of those could be responsible for this massive difference between experiment and modeling. Simply not getting rid of all active forces (inadequate silencing) could be one obvious reason. Other reasons could be that the model is using underestimates of passive forces, as alluded to in point 3.

      (4a) Although we agree that measuring different points on the body would allow us to estimate the moments, we disagree that the height of the fly cannot be evaluated from the measurement of a single point. The measurements have been performed using the same techniques that we used to assess the fly’s height in a different study where we estimated the resolution of our imaging system to be ~20 mm(Chun et. al. 2021). We will include these details in the revised manuscript. The video showing the falling experiments are not available or referenced in the manuscript. These will be made available.

      b) We will repeat the “falling” experiment with a more restrictive driver.

      c) We disagree with the reviewer on this point. The system has a resolution of ~20 mm and is sufficient to make conclusion about the difference in the height of the fly. We will clarify this point in the revised manuscript.

      d) We do not follow the reviewer’s rationale here. The passive forces in the model (along with any residual forces) are the same in the model as well as in the experiment. Moreover, there will be a delay between light onset, neuronal inactivation and muscle inactivation. These processes are not instantaneous. In Figure 6, we estimate these delays and have concluded that they will cause substantial delay. In the revised manuscript, we will discuss other reasons for the delay suggested by the reviewer.

      (5) Final figure (Figure 6) focuses on understanding the time course of neuronal silencing. First of all, I'm not entirely sure how relevant this is for the story. It could be an interesting supplemental data. But it seems a bit tangential. Additionally, it also suffers from major caveats.

      a) The authors now use a new genetic driver for which they don't have any behavioral data in any previous figures. So we do not know if any of this data holds true for the previous experiments. The authors perform whole-cell recordings from random unidentified motor neurons labeled by E49-Gal4>GtACR1 to deduce a time constant for behavioral results obtained in the VGlut-Gal4>GtACR1 experiments.

      b) The DMD setup is useful for focal inactivation, however, the appropriate controls and data are not presented. Line 200 "A spot of light on the cell body produces as much of the hyperpolarization as stimulating the entire fly (mean of 11.3 mV vs 13.1 mV across 9 neurons). Conversely, excluding the cell body produces only a small effect on the MN (mean of 2.6 mV)." First of all, the control experiment for showing that DMD is indeed causing focal inactivation would be to gradually move the spot of light away from the labeled soma, i.e. to the neighboring "labelled" soma and show that there is indeed focal inactivation. Instead authors move it quite a long distance into unlabeled neuropil. Secondly, I still don't get why the authors are doing this experiment. Even if we believe the DMD is functioning perfectly, all this really tells us is that a random subset motor neurons (maybe 5 or 6 cells, legend is missing this info) labeled by E49-Gal4 is strongly hyperpolarized by its own GtACR1 channel opening, rather than being impacted because of hyperpolarizations in other E49-Gal4 labeled neurons. This has no relevance to the interpretation of any of the VGlut-Gal4 behavioral data. VGLut-Gal4 is much broader and also labels all glutamatergic neurons, most of which are inhibitory interneurons whose silencing could lead to disinhibition of downstream networks.

      (5 a) However, we can address the reviewer critique by recording from the Vglut line while using a MN line to target the recordings to MNs.

      b) Once we use the Vglut driver to perform these recordings, it will help assess how much of the MN inactivation is due to the GtACR expressed in the MN versus other neurons.

      Reviewer 2:

      While (as mentioned above) the study's conclusions are well-supported by the results and modeling, limitations arise because of the assumptions made. For instance, using a linear approximation may not hold at larger joint angles, and future studies would benefit from accounting for nonlinearities. Future studies could also delve into the source of passive forces, which is important for more deeply understanding the anatomical and physical basis of the results in this study. For instance, assessments of muscle or joint properties to correlate stiffness values with physical structure might be an area of future consideration.

      We agree with these comments but believe that these studies represent avenues for future work.

      Reviewer 3:

      (1) Passive torques are measured, but only some short speculative statements, largely based on previous work, are offered on their functional significance; some of these claims are not well supported by experimental evidence or theoretical arguments. Passive forces are judged as "large" compared to the weight force of the limb, but the arguably more relevant force is the force limb muscles can generate, which, even in equilibrium conditions, is already about two orders of magnitude larger. The conclusion that passive forces are dynamically irrelevant seems natural, but contrasts with the assertion that "passive forces [...] will have a strong influence on limb kinematics". As a result, the functional significance of passive joint torques in the fruit fly, if any, remains unclear, and this ambiguity represents a missed opportunity. We now know the magnitude of passive joint torques - do they matter and for what? Are they helpful, for example, to maintain robust neuronal control, or a mechanical constraint that negatively impacts performance, e.g., because they present a sink for muscle work?

      To us, measuring passive forces was the first step to understanding neural/biomechanical control of limb. In general, we agree with these comments and would like to understand the role of passive forces in overall control of limb. A complete discussion of the role of the significance of passive forces in the control of limb is beyond the scope of this study. We would like to note that it is unlikely that the active forces are two orders of magnitude larger during unloaded movement of the limb. However, these issues will have to be settled in future work.

      (2) The work is framed with a scaling argument, but the assumptions that underpin the associated claims are not explicit and can thus not be evaluated. This is problematic because at least some arguments appear to contradict textbook scaling theory or everyday experience. For example, active forces are assumed to scale with limb volume, when every textbook would have them scale with area instead; and the asserted scaling of passive forces involves some hidden assumptions that demand more explicit discussion to alert the reader to associated limitations. Passive forces are said to be important only in small animals, but a quick self-experiment confirms that they are sufficient to stabilize human fingers or ankles against gravity, systems orders of magnitude larger than an insect limb, in seeming contradiction with the alleged dominance of scale. Throughout the manuscript, there are such and similar inaccuracies or ambiguities in the mechanical framing and interpretation, making it hard to fairly evaluate some claims, and rendering others likely incorrect.

      We interpret this comment as making two separate points. The first one is that the reviewer says that our statement that active forces depend on the third power of the limb or L<sup>3</sup> is incorrect. We agree and apologize for this oversight. Specifically, on L6-7 we say, “both inertial forces and active forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. Instead, this statement should read “inertial forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. However, this oversight does not affect the scaling argument as the scaling arguments in the rest of the manuscript only involves inertial forces and not active forces.

      The second point is about the scaling law that governs passive forces. In the current manuscript, we have assumed that the passive forces scale as L<sup>2</sup> based on previous work. The reviewer has pointed out that this assumption might be incorrect or at the very least needs a rationale. We agree with this assessment: passive forces that arise in the muscle are likely to scale as L<sup>2</sup> but passive forces that arise in the joint might not. In the revised manuscript, we will discuss this concern.

      Response to the public comment:

      There was a comment from a reader: “None of our work cited in various places in this preprint (i.e., Zakotnik et al. 2006, Guschlbauer et al. 2007, Page et al. 2008, Hooper et al. 2009, Hooper 2012, Ache and Matheson 2012, Blümel et al. 2012, Ache and Matheson 2013, von Twickel et al. 2019, and Guschlbauer et al. 2022) claims or implies that passive forces could be sufficient to support the weight of an insect or any animal. To claim or suggest otherwise (as done in lines 33-35) is incorrect and sets up a misleading straw man that misrepresents our work. All statements in the preprint regarding our work related to this specific matter need to be removed or edited accordingly. For instance, the investigations, calculations, and interpretations in Hooper et al. 2009 are solely about limbs that are not being used in stance or other loaded tasks (indeed, the article's title specifically refers to "unloaded" leg posture and movements). Trying to use this work to predict whether passive muscle forces alone can support a stick insect against gravity requires considering much more than the oversimplified calculation given in lines 290-292. Other “back of the envelope calculations” (lines 299-300) are likely also insufficient and erroneous. The discussion in lines 289-304 needs to be edited accordingly”

      We thank the reader for their comment. However, we interpret these studies differently. The studies above rightly focused on unloaded legs because it would be difficult to study passive forces in an intact insect without genetic tools. The commenter correctly points out that these studies do not comment on whether passive forces are strong enough to support the weight of the fly. However, we disagree that our arguments based on their results are unreasonable or strawman. We think that our interpretation of their measurements is correct. Moreover, we were motivated by Yox et. el. 1982 who states in so many words: “Stiffness of the muscles in the joints of all the legs might be sufficient to support a resting arthropod. A more rigorous analysis of all supporting limbs and joint angles would be required to prove this hypothesis”. We were inspired by this comment. In the revised manuscript, we will make it clear that the statement made in Line 33 is based on Yox. et. al. and our interpretation of measurements made by others.

    1. Author response:

      Reviewer 1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

      We thank the reviewer for the very positive comments. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Further analysis of normal human brain growth using computational and physical gel models can be found in our companion paper (Yin et al., 2025), also submitted to eLife:

      S. Yin, C. Liu, G. P. T. Choi, Y. Jung, K. Heuer, R. Toro, L. Mahadevan, Morphogenesis and morphometry of brain folding patterns across species. bioRxiv 2025.03.05.641692.

      In future work, we plan to obtain malformed human cortical surface data, which would allow us to further investigate related alterations more directly.

      Reviewer 2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      We thank the reviewer for the very positive comments.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences. Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments. I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      We thank the reviewer for the comments. Besides the visual and qualitative comparisons between the models, we would like to point out that we have included the quantification of the shape difference between the real and simulated ferret brain models via spherical parameterization and the curvature-based shape index as detailed in main text Fig. 4 and SI Section 3. We have also utilized spherical harmonics representations for the comparison between the real and simulated ferret brains at different maximum order N. In our revision, we plan to further include the curvature-based shape index calculations for the comparison between the real and simulated ferret brains at more time points.

      As for the comparison between the malformation simulation results and human MRI data in the current work, since the human MRI data are two-dimensional while our computational models are threedimensional, we focus on the qualitative comparison between them. In future work, we plan to obtain malformed human cortical surface data, from which we can then perform the parameterization-based and curvature-based shape analysis for a more quantitative assessment.

      I felt that MCDs could have been better contextualized in the introduction.

      We thank the reviewer for the comment and will include a more detailed introduction to MCDs in our revision.

    1. Author response:

      We greatly appreciate the efforts of the reviewers, which have provided insightful and helpful comments to improve the manuscript. The feedback touches upon a number of topics, focusing on clarification or justification of experimental techniques and on understanding the mechanism by which P. aeruginosa detects HOCl. All reviewers raised the issue of how HOCl activates fro expression, including whether free or protein-bound methionine, cysteine, or other HOCl byproducts induce this expression. For the upcoming revision, we plan to perform experiments that address this issue and will discuss potential mechanistic models in light of the new data. In addition, we plan to perform additional experiments to address a reviewer’s concerns regarding the dependence of the fro response on HOCl production by neutrophils. The revision will correct imprecise statements pointed out by reviewers, and address all remaining issues requiring clarification or further discussion, including the range of HOCl sensitivity, relationship between HOCl and flow sensitivity, and justification for testing the fro response to nitric acid.

    1. Author response:

      Reviewer #1 (Public review):

      […] Strengths:

      This manuscript has many strengths.

      (1) Utilizing and characterizing novel mouse strains that complement the current widely used mouse models in the field of TB. Many of those mouse strains will be novel tools for studying host responses to Mtb infection.

      (2) The study revealed very unique biology of neutrophils during Mtb infection. It has been well-established that high numbers of neutrophils correlate with high bacterial burden in mice. However, this work uncovered that some mouse strains could be resistant to infection even with high numbers of neutrophils in the lung, indicating the diverse functions of neutrophils. This information is important.

      We are grateful for the reviewer’s thoughtful consideration of our work and appreciate their comment that our mouse strains can benefit the models available in the TB field. We further appreciate the recognition of the importance of neutrophil diversity during Mtb infection.

      Weaknesses:

      The weaknesses of the manuscript are that the work is relatively descriptive. It is unclear whether the neutrophil subsets are indeed functionally different. While single-cell RNA seq did provide some clues at transcription levels, functional and mechanistic investigations are lacking.

      We appreciate this comment and agree that further research needs to be done on the functionality of the neutrophils to discover mechanistic differences between the mouse genotypes. Out attempts at extracting sufficient RNA from sorted neutrophils from the mouse lungs were unsuccessful. However, future attempts at comparing RNA expression between mouse genotypes as well as proteomic data are necessary to determine the mechanistic differences in neutrophil biology in these mice.

      Similarly, it is unclear how highly activated and glycolytic neutrophils in MANC strain contribute to its susceptibility.

      This is a fair comment and we agree that it is still unclear how these neutrophils contribute to MANC susceptibility. Growing the neutrophils ex vivo and infecting them with Mtb is technically challenging, due to the slow growth of Mtb and the short lifespan of the neutrophils. As mentioned in the comment above, future in vivo characterization and RNA expression studies will be necessary to address these questions.

      Reviewer #2 (Public review):

      […] Strengths:

      The strengths are addressing a critically important consideration in the tuberculosis field - mouse model(s) of the human disease, and taking advantage of the novel phenotypes observed to determine potential mechanisms. Notable strengths include,

      (1) Innovative generation and use of mouse models: Developing wild-derived inbred mice from diverse geographic locations is innovative, and this approach expands the range of phenotypic responses observed during M. tuberculosis infection. Additionally, the authors have deposited strains at The Jackson Laboratory making these valuable resources available to the scientific community.

      (2) Potential for translational research: The findings have implications for human pulmonary TB, particularly the discovery of neutrophil-associated susceptibility in primary infection and/or neutrophil-mediated disease progression that could both inform the development of therapeutic targets and also be used to test the effectiveness of such therapies.

      (3) Comprehensive experimental design: The investigators use many complementary approaches including in vivo M. tuberculosis infection, in vitro macrophage studies, neutrophil depletion experiments, flow cytometry, and a number of data mining, machine learning, and imaging to produce robust and comprehensive analyses of the wild-derives d strains and neutrophil subpopulations in 3 weeks after M. tuberculosis infection.

      We thank the reviewer for their thorough and thoughtful assessment of our study. We appreciate the recognition that this mouse model can become a resource and can benefit the study of different immune responses to Mtb infection as well as be informative for studying human TB. We further appreciate their comment that the complementary approaches we have used to characterized the mouse phenotypes strengthens this study.

      Weaknesses:

      The manuscript and studies have considerable strengths and very few weaknesses. One minor consideration is that phenotyping is limited to a single limited-time point; however, this time point was carefully selected and has a strong biological rationale provided by investigators. This potential weakness does not diminish the overall findings, exciting results, or conclusions.

      We thank the reviewer for pointing out that a single time point has been studied, and that this time point is biologically relevant. We agree that additional time points, including later time points that address systemic dissemination, should be included in future studies.

    1. Author response:

      General Statements

      We are grateful for constructive reviewers’ comments and criticisms and have thoroughly addressed all major and minor comments in the revised manuscript.

      Summary of new data.

      We have performed the following additional experiments to support our concept:

      (1) The kinetcs of ROS production in B6 and B6.Sst1S macrophages after TNF stimulation (Fig. 3I and J, Suppl. Fig. 3G);

      (2) Time course of stress kinase activation (Fig.3K) that clearly demonstrated the persistent stress kinase (phospho-ASK1 and phospho-cJUN) activation exclusively in. the B6.Sst1S macrophages;

      (3) New Fig.4 C-E panels include comparisons of the B6 and B6.Sst1S macrophage responses to TNF and effects of IFNAR1 blockade in both backgrounds.

      (4) We performed new experiments demonstrating that the synthesis of lipid peroxidation products (LPO) occurs in TNF-stimulated macrophages earlier than the IFNβ super-induction (Suppl.Fig.4A and B).

      (5) We demonstrated that the IFNAR1 blockade 12, 24 and 32 h after TNF stimulation still reduced the accumulation of LPO product (4-HNE) in TNF-stimulated B6.Sst1S BMDMs (Suppl.Fig.4 E-G).

      (6) We added comparison of cMyc expression between the wild type B6 and B6.Sst1S BMDMs during TNF stimulation for 6-24 h (Fig.5I-J).

      (7) New data comparing 4-HNE levels in Mtb-infected B6 wild type and B6.Sst1S macrophages and quantification of replicating Mtb was added (Fig.6B, Suppl.Fig.7C and D).

      (8) In vivo data described in Fig.7 was thoroughly revised and new data was included. We demonstrated increased 4-HNE loads in multibacillary lesions (Fig.7A, Suppl. Fig.9A) and the 4-HNE accumulation in CD11b+ myeloid cells (Fig.7B and Suppl.Fig.9B). We demonstrated that the Ifnb – expressing cells are activated iNOS+ macrophages (Fig.7D and Suppl.Fig.13A). Using new fluorescent multiplex IHC, we have shown that stress markers phopho-cJun and Chac1 in TB lesions are expressed by Ifnb- and iNOS-expressing macrophages (Fig.7E and Suppl.Fig.13D-F).

      (9) We performed additional experiment to demonstrate that naïve (non-BCG vaccinated) lymphocytes did not improve Mtb control by Mtb-infected macrophages in agreement with previously published data (Suppl.Fig.7H).

      Summary of updates

      Following reviewers requests we updated figures to include isotype control antibodies, effects of inhibitors on non-stimulated cells, positive and negative controls for labile iron pool, additional images of 4-HNE and live/dead cell staining.

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C -E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Positive and negative controls for labile iron pool measurements were added to Fig.3E, Fig.5D, Suppl.Fig.3B

      Cell death staining images were added Suppl.Fig.3H

      Co-staining of 4-HNE with tubulin was added to Suppl.Fig.3A.

      High magnification images for Figure 7 were added in Suppl.Fig.8 to demonstrate paucibacillary and multibacillary image classification.

      Single-channel color images for individual markers were provided in Fig.7E and Suppl.Fig.13B-F.

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B. Titration of CSF1R inhibitors for non-toxic concentration determination are included in Suppl.Fig.6D.

      In addition, we updated the figure legends in the revised manuscript to include more details about the experiments. We also clarified our conclusions in the Discussion. Responses to every major and minor comment of the reviewers are provided below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity:

      Summary

      The study by Yabaji et al. examines macrophage phenotypes B6.Sst1S mice, a mouse strain with increased susceptibility to M. tuberculosis infection that develops necrotic lung lesions. Extending previous work, the authors specifically focus on delineating the molecular mechanisms driving aberrant oxidative stress in TNF-activated B6.Sst1S macrophages that has been associated with impaired control of M. tuberculosis. The authors use scRNAseq of bone marrow-derived macrophages to further characterize distinctions between B6.Sst1S and control macrophages and ascribe distinct trajectories upon TNF stimulation. Combined with results using inhibitory antibodies and small molecule inhibitors in in vitro experimentation, the authors propose that TNF-induced protracted c-Myc expression in B6.Sst1S macrophages disables the cellular defense against oxidative stress, which promotes intracellular accumulation of lipid peroxidation products, fueled at least in part by overexpression of type I IFNs by these cells. Using lung tissue sections from M. tuberculosis-infected B6.Sst1S mice, the authors suggest that the presence of a greater number of cells with lipid peroxidation products in lung lesions with high counts of stained M. tuberculosis are indicative of progressive loss of host control due to the TNF-induced dysregulation of macrophage responses to oxidative stress. In patients with active tuberculosis disease, the authors suggest that peripheral blood gene expression indicative of increased Myc activity was associated with treatment failure.

      Major comments

      The authors describe differences in protein expression, phosphorylation or binding when referring to Fig 2A-C, 2G, 3D, 5B, 5C. However, such differences are not easily apparent or very subtle and, in some cases, confounded by differences in resting cells (e.g. pASK1 Fig 3L; c-Myc Fig 5B) as well as analyses across separate gels/blots (e.g. Fig 3K, Fig 5B). Quantitative analyses across different independent experiments with adequate statistical analyses are required to strengthen the associated conclusions.

      We updated our Western blots as follows:

      (1) Densitometery of normalized bands is included above each lane (Fig.2A-C; Fig.3C-D and 3K; Fig.4A-B; Fig.5B,C,I,J). New data in Fig.3K is added to highlight differences between B6 and B6.Sst1S at individual timepoints after TNF stimulation. In Fig.5I we added new data comparing Myc levels in B6 and B6.Sst1S with and without JNK inhibitor and updated the results accordingly. New Fig.3K clearly demonstrates the persistent activation of p-cJun and pAsk1 at 24 and 36h of TNF stimulation. In Fig.5B we clearly demonstrate that Myc levels were higher in B6.Sst1S after 12 h of TNF stimulation. At 6h, however, the basal differences in Myc levels are consistently higher in B6.Sst1S and the induction by TNF is 1.6-fold similar in both backgrounds. We noted this in the text.

      (2) A representative experiment is shown in individual panels and the corresponding figure legend contains information on number of biological repeats. Each Western blot was repeated 2 – 4 times.

      The representative images of fluorescence microscopy in Fig 3H, 4H, 5H, S3C, S3I, S5A, S6A seem to suggest that under some conditions the fluorescence signal is located just around the nucleus rather than absent or diminished from the cytoplasm. It is unclear whether this reflects selective translocation of targets across the cell, morphological changes of macrophages in culture in response to the various treatments, or variations in focal point at which images were acquired. Control images (e.g. cellular actin, DIC) should be included for clarification. If cell morphology changes depending on treatments, how was this accounted for in the quantitative analyses? In addition, negative controls validating specificity of fluorescence signals would be warranted.

      Our conclusion of higher LPO production is based on several parameters: 4-HNE staining, measurements of MDA in cell lysates and oxidized lipids using BODIPY C11. Taken together they demonstrate significant and reproducible increase in LPO accumulation in TNFstimulated B6.Sst1S macrophages. This excludes imaging artefact related to unequal 4-HNE distribution noted by the reviewer. In fact, we also noted that the 4-HNE was spread within cell body of B6.Sst1S macrophages and confirmed it using co-staining with tubulin, as suggested by the reviewer (new Suppl.Fig.3A). Since low molecular weight LPO products, such as MDA and 4-HNE, traverse cell membranes, it is unlikely that they will be strictly localized to a specific membrane bound compartment. However, we agree that at lower concentrations, there might be some restricted localization, explaining a visible perinuclear ring of 4-HNE staining in B6 macrophages. This phenomenon may be explained just by thicker cytoplasm surrounding nucleus in activated macrophages spread on adherent plastic surface or by proximity to specific organelles involved in generation or clearance of LPO products and definitively warrants further investigation.

      We also included images of non-stimulated cells in Fig.3H, Suppl.Fig.3A and 3E. We used multiple fields for imaging and quantified fluorescence signals (Suppl. Fig.3D and 3F, Suppl.Fig.4G, Suppl.Fig.6A and B).

      We used negative controls without primary antibodies for the initial staining optimization, but did not include it in every experiment.

      To interpret the evaluation on the hierarchy of molecular mechanisms in B6.Sst1S macrophages, comparative analyses with B6 control cells should be included (e.g. Fig 4C-I, Fig 5, Fig 6B, E-M, S6C, S6E-F). This will provide weight to the conclusions that the dysregulated processes are specifically associated with the susceptibility of B6.Sst1S macrophages.

      Understanding the sst1-mediated effects on macrophage activation is the focus of our previously published studies Bhattacharya et al., JCI, 2021) and this manuscript. The data comparing B6 and B6.Sst1S macrophage are presented in Fig.1, Fig.2, Fig.3, Fig.4, Fig.5A-C, I and J, Fig.6A-C, 6J and corresponding supplemental figures 1, 2, 3, 4A and B, Suppl.Fig.5, Suppl.Fig.6C, Suppl.Fig.7A-D,7F.

      Once we identified the aberrantly activated pathways in the B6.Sst1S, we used specific inhibitors to correct the aberrant response in B6.Sst1S.

      All experiments using inhibitory antibodies require comparison to the effect of a matched isotype control in the same experiment (e.g. Fig 3J, 4F, G, I; 6L, 6M, S3G, S6F).

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C-E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Experiments using inhibitors require inclusion of an inhibitor-only control to assess inhibitor effects on unstimulated cells (e.g. Fig 4I, 5D-I)

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B.

      Fig 3K and Fig 5J appear to contain the same images for p-c-Jun and b-tubulin blots.

      Fig.3K and 5J partially overlapped but had different focus – 3K has been updated to reflect the time course of stress kinase activation. Fig.5J is updated (currently Fig.5I and J) to display B6 and B6.Sst1S macrophage data including cMyc and p-cJun levels.

      Data of TNF-treated cells in Fig 3I appear to be replotted in Fig 3J.

      Currently these data is presented in Fig.3L and 3M and has been updated to include comparison of B6 and B6.Sst1S cells (Fig.3L) and effects of inhibitors in Fig.3M.

      It is stated that lungs from 2 mice with paucibacillary and 2 mice with multi-bacillary lesions were analyses. There is contradicting information on whether these tissues were collected at the same time post infection (week 14?) or whether the pauci-bacillary lesions were in lungs collected at earlier time points post infection (see Fig S8A). If the former, how do the authors conclude that multi-bacillary lesions are a progression from paucibacillary lesions and indicative of loss of M. tuberculosis control, especially if only one lesion type is observed in an individual host? If the latter, comparison between lesions will likely be dominated by temporal differences in the immune response to infection.

      In either case, it is relevant to consider density, location, and cellular composition of lesions (see also comments on GeoMx spatial profiling). Is the macrophage number/density per tissue area comparable between pauci-bacillary and multi-bacillary lesions?

      We did not collect lungs at the same time point. As described in greater detail in our preprints (Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) pulmonary TB lesions in our model of slow TB progression are heterogeneous between the animals at the same timepoint, as observed in human TB patients and other chronic TB animal models. Therefore, we perform analyses of individual TB lesions that are classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8. Currently it is impossible to monitor progression of individual lesions in mice. However, in mice TB is progressive disease and no healing and recovery from the disease have been observed in our studies or reported in literature. Therefore, we assumed that paucibacillary lesions preceded the multibacillary ones, and not vice versa, thus reflecting the disease progression. In our opinion, this conclusion most likely reflects the natural course of the disease. However, we edited the text : instead of disease progression we refer to paucibacillary and multibacillary lesions.

      Does 4HNE staining align with macrophages and if so, is it elevated compared to control mice and driven by TNF in the susceptible vs more resistant mice?

      We performed additional staining and analyses to demonstrate the 4-HNE accumulation in CD11b+ myeloid cells of macrophage morphology. Non-necrotic lesions contain negligible proportion of neutrophils (Fig.7B, Suppl.Fig.9B). B6 mice do not develop advanced multibacillary TB lesions containing 4-HNE+ cells. Also, 4-HNE staining was localized to TB lesions and was not found in uninvolved lung areas of the infected mice, as shown in Suppl.Fig.9A (left panel).

      It is well established that TNF plays a central role in the formation and maintenance of TB granulomas in humans and in all animal models. Therefore, TNF neutralization would lead to rapid TB progression, rapid Mtb growth and lesions destruction in both B6 and B6.Sst1S genetic backgrounds.

      Pathway analysis of spatial transcriptomic data (Suppl.Fig.11) identified TNF signaling via NFkB among dominant pathways upregulated in multibacillary lesions, suggesting that the 4-HNE accumulation paralleled increased TNF signaling. In addition, in vivo other cytokines, including IFN-I, could activate macrophages and stimulate production of reactive oxygen and nitrogen species and lead to the accumulation of LPO products as shown in this manuscript.

      It would be relevant to state how many independent lesions per host were sampled in both the multiplex IHC as well as the GeoMx data. Can the authors show the selected regions of interest in the tissue overview and in the analyses to appreciate within-host and across-host heterogeneity of lesions. The nature of the spatial transcriptomics platform used is such that the data are derived from tissue areas that contain more than just Iba1+ macrophages. At later stages of infection, the cellular composition of such macrophage-rich areas will be different when compared to lesions earlier in the infection process. Hence, gene expression profiles and differences between tissue regions cannot be attributed to macrophages in this tissue region but are more likely a reflection of a mix of cellular composition and per-cell gene expression.

      We used Iba1 staining to identify macrophages in TB lesions and programmed GeoMx instrument to collect spatial transcriptomics probes from Iba1+ cells within ROIs. Also, we selected regions of interest (ROI) avoiding necrotic areas (depicted in Suppl.Fig.10). We agree that Iba1+ macrophage population is heterogenous – some Iba1+ cells are activated iNOS+ macrophages, other are iNOS-negative (Fig.7C and D, and Suppl.Fig.13A). Multibacillary lesions contain larger areas occupied by activated (iNOS+) macrophages (Fig.7D,

      Suppl.Fig.13B and 13F). Although the GeoMx spatial transcriptomic platform does not provide single cell resolution, it allowed us to compare populations of Iba1+ cells in paucibacillary and multibacillary TB lesions and to identify a shift in their overall activation pattern.

      It is stated that loss of control of M. tuberculosis in multibacillary lesions was associated with "downregulation of IFNg-inducible genes". If the authors base this on the tissue expression of individual genes, this requires further investigation to support such conclusion (also see comment on GeoMx above). Furthermore, how might this conclusion be compatible with significantly elevated iNOS+ cells (Fig 7D) in multibacillary lesions?

      We demonstrated that Ciita gene expression is specifically induced by IFN-gamma and is suppressed by IFN-I (Fig.6M). The expression of Ciita in paucibacillary lesions suggest the presence of the IFN-gamma activated cells and its disappearance in the multibacillary lesion is consistent with massive activation of IFN-I pathway (Fig.7C).

      It is appreciated that the human blood signature analyses contain Myc-signatures but the association with treatment failure is not very strong based on the data in Fig 13B and C (Suppl.Fig.15B and C now). The authors indicate that they have no information on disease severity, but it should perhaps not be assumed that treatment failure is indicative of poor host control of the infection. Perhaps independent analyses in separate cohort/data set can add strength and provide -additional insights (e.g. PMID: 35841871; PMID: 32451443, PMID: 17205474, PMID: 22872737). In addition, the human data analyses could be strengthened by extension to additional signatures such as IFN, TNF, oxidative stress. Details of the human study design are not very clear and are lacking patient demographics, site of disease, time of blood collection relative to treatment onset, approving ethics committees.

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets (MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set. The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis.

      Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice. The detailed analysis of differentially regulated pathways in human TB patients is beyond the scope of this study and is presented in another manuscript entitled “ Tuberculosis risk signatures and differential gene expression predict individuals who fail treatment” by Arthur VanValkenburg et al., submitted for publication.

      Blood collection for PBMC gene expression profiling of TB patients was prior to TB treatment or within a first week of treatment commencement. Boxplot of bootstrapped ssGSEA enrichment AUC scores from several oncogene signatures ranked from lowest to highest AUC score, with myc_up and myc_dn genes highlighted in red.

      We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      We updated the details of the study, including study sites and the ethics committee approval statement and references describing these cohorts.

      Other comments

      It is excellent that the authors provide individual data points. Choosing a colour other than black would increase clarity when black bars are used.

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Error bars are inconsistently depicted as either bi-directional or just unidirectional.

      We used bi-directional error bars in the revised manuscript.

      Fig 1E, G, H - please include a scale to clarify what the heat map is representing.

      We have included the expression key in Fig.1E,G and H and Suppl.Fig.1C and D in the revised version.

      Fig 2K, Fig S10A gene information cannot be deciphered.

      We increased the font in previous Fig.2K and moved to supplement to keep larger fonts (current Suppl.Fig.2G).

      Fig S4A,B please add error bars.

      These data are presented as Suppl.Fig.5 in the revised version. We performed one experiment to test the hypothesis. Because the data indicated no clear increase in transposon small RNAs in the sst1S macrophages, we did not pursue this hypothesis further, and therefore, the error bars were not included. However, we decided to include these negative data because it rejects a very attractive and plausible hypothesis.

      Please use gene names as per convention (e.g. Ifnb1) to distinguish gene expression from protein expression in figures and text.

      We addressed the comment in the revised manuscript.

      Fig S8B. Contrary to the description of results, there seems to be minimal overlap between the signal for YFP and the Ifnb1 probe. Is the Ifnb1 reporter mouse a legacy reporter? If so, it is worth stating this and including such considerations in the data interpretation.

      The YFP reporter expresses YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells and while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. So YFP is not a lineage tracing reporter, but its accumulation marks the Ifnb1 promoter activity in cells, although the YFP protein half-life is longer than that of the Ifnb1 mRNA that is rapidly degraded (Witt et al., BioRxiv, 2024; doi:10.1101/2024.08.28.61018). Therefore, there is no precise spatiotemporal coincidence of these readouts.

      Please clarify what is meant by "normal interstitium" ? If the tissue is from uninfected mice, please state clearly.

      In this context we refer to the uninvolved lung areas of the infected lungs. In every sample we compare uninvolved lung areas and TB lesions of the same animal. Also, we performed staining of lung of non-infected mice as additional controls.

      If macrophage cultures underwent media changes every 48h, how was loss of liberated Mtb taken into account especially if differences in cell density/survival were noted? The assessment of M. tuberculosis load by qPCR is not well described. In particular, the method of normalization applied within the experiments (not within the qPCR) here remains unclear, even with reference to the authors' prior publication.

      Our lab has many years of experience working with macrophage monolayers infected with virulent Mtb and uses optimized protocols to avoid cell losses and related artifacts. Recently we published a detailed protocol for this methodology in STAR Protocols (Yabaji et al., 2022; PMID 35310069). In brief, it includes preparation of single cell suspensions of Mtb by filtration to remove clumps, use of low multiplicity of infection, preparation of healthy confluent monolayers and use of nutrient rich culture medium and medium change every 2 days. We also rigorously control for cell loss using whole well imaging and quantification of cell numbers and live/dead staining.

      Please add citation for the limma package.

      The references has been added (Ritchie et al, NAR 2015; PMID 25605792).

      The description of methodology relating to the "oncogene signatures" is unclear.

      This signature was described in Bild etal, Nature, 2006 and McQuerry JA, et al, 2019 “Pathway activity profiling of growth factor receptor network and stemness pathways differentiates metaplastic breast cancer histological subtypes”. BMC Cancer 19: 881 and is cited in Methods section Oncogene signatures

      Please clearly state time points post infection for mouse analyses.

      We collected lung samples from Mtb infected mice 12 – 20 weeks post infection. The lesions were heterogeneous and were individually classified using criteria described above.

      Reference is made to "a list of genes unique to type I [interferon] genes [....]" (p29). Can the authors indicate the source of the information used for compiling this list?

      The lists were compiled from Reactome, EMBL's European Bioinformatics Institute and GSEA databases. The links for all datasets are provided in Suppl.Table 8 “Expression of IFN pathway genes in Iba1+ cells from pauci- and multi-bacillary lesions of Mtb infected B6.Sst1S mouse lungs” in the “Pool IFN I & II gene sets” worksheet.

      The discussion at present is very long, contains repetition of results and meanders on occasion.

      Thank you for this suggestion, We critically revised the text for brevity and clarity.

      Reviewer #1 (Significance):  

      Strengths and limitations  

      Strengths: multi-pronged analysis approaches for delineating molecular mechanisms of macrophage responses that might underpin susceptibility to M. tuberculosis infection; integration of mouse tissues and human blood samples  

      Weaknesses: not all conclusions supported by data presented; some concerns related to experimental design and controls; links between findings in human cohort and the mechanistic insights gained in mouse macrophage model uncertain

      The revised manuscript addresses every major and minor comment of the reviewers, including isotype controls and naïve T cells, to provide additional support for our conclusions. Our study revealed causal links between Myc hyperactivity with the deficiency of anti-oxidant defense and type I interferon pathway hyperactivity. We have shown that Myc hyperactivity in TNF-stimulated macrophages compromises antioxidant defense leading to autocatalytic lipid peroxidation and interferon-beta superinduction that in turn amplifies lipid peroxidation, thus, forming a vicious cycle of destructive chronic inflammation. This mechanism offers a plausible mechanistic explanation of for the association of Myc hyperactivity with poorer treatment outcomes in TB patients and provide a novel target for host-directed TB therapy.

      Advance

      The study has the potential to advance molecular understanding of the TNF-driven state of oxidative stress previously observed in B6.Sst1S macrophages and possible implications for host control of M. tuberculosis in vivo.

      Audience

      Experts seeking understanding of host factors mediating M. tuberculosis control, or failure thereof, with appreciation for the utility of the featured mouse model in assessing TB diseases progression and severe manifestation. Interest is likely extended to audience more broadly interested in TNF-driven macrophage (dys)function in infectious, inflammatory, and autoimmune pathologies.

      Reviewer expertise

      In preparing this review, I am drawing on my expertise in assessing macrophage responses and host defense mechanisms in bacterial infections (incl. virulent M. tuberculosis) through in vitro and in vivo studies. This includes but is not limited to macrophage infection and stimulation assays, microscopy, intra-macrophage replication of M. tuberculosis, analyses of lung tissues using multi-plex IHC and spatial transcriptomics (e.g. GeoMx). I am familiar with the interpretation of RNAseq analyses in human and mouse cells/tissues, but can provide only limited assessment of appropriateness of algorithms and analysis frameworks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Yabaji et al. investigated the effects of BMDMs stimulated with TNF from both WT and B6.Sst1S mice, which have previously been identified to contain the sst1 locus conferring susceptibility to Mycobacterium tuberculosis. They identified that B6.Sst1S macrophages show a superinduction of IFNß, which might be caused by increased c-Myc expression, expanding on the mechanistic insights made by the same group (Bhattacharya et al. 2021). Furthermore, prolonged TNF stimulation led to oxidative stress, which WT BMDMs could compensate for by the activation of the antioxidant defense via NRF2. On the other hand, B6.Sst1S BMDMs lack the expression of SP110 and SP140, co-activators of NRF2, and were therefore subjected to maintained oxidative stress. Yabaji et al. could link those findings to in vivo studies by correlating the presence of stressed and aberrantly activated macrophages within granulomas to the failure of Mtb control, as well as the progression towards necrosis. As the knowledge regarding Mtb progression and necrosis of granulomas is not yet well understood, findings that might help provide novel therapy options for TB are crucial. Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      In particular a) important controls are often missing, e.g. T-cells form non-immune mice in Fig. 6J, in F, effectivity of BCG in B6 mice in 6N; b) single experiments are shown throughout the manuscript, in particular western blots and histology without proper quantification and statistics, this is absolutely not acceptable; c) very few repetitions are shown in in vitro experiments, where there is no evidence for limitation in resources (usually not more than 3), it is not clear what "independent experiment means" - i.e. the robustness of the findings is questionable; d) data are often normalized multiple times, e.g. in the case of qPCR, and the methods of normalization are not clear (what house-keeping gene exactly?);

      Moreover, experiments regarding IFN I signaling (e.g. short term TNF treatment of BMDMs to analyze LPO, making sure that the reporter mouse for IFNß works in vivo) and c-Myc (e.g. the increase after M-CSF addition might impact on other analysis as well and the experiments should be adjusted to control for this effect; MYC expression in the human samples) should be carefully repeated and evaluated to draw correct conclusions.

      In addition, we would like to strongly encourage the authors to more precisely outline the experimental set-ups and figure legends, so that the reader can easily understand and follow them. In other words: The legends are - in part very - incomplete. In addition, the authors should be mindful of gene names vs. protein names and italicize where appropriate.

      We appreciate a very thorough evaluation of our manuscript by this reviewer. Their insightful comments helped us improve the manuscript. As outlined below in point-by-point responses (1) we added important controls including isotype control antibodies in IFNAR blocking experiments and non-vaccinated T cells in T cell – macrophage interactions experiments; updated figure legends to indicate number of repeated experiment where a representative experiment is shown, numbers of mouse lungs and individual lesions, methods of data normalization, where it was missing. We also explained our in vitro experimental design and how we analyzed and excluded effects of media change and fresh CSF1 addition, by using a rest period before TNF stimulation and Mtb infection. The data shown in Suppl. Fig. 6C (previously Suppl. Fig. 5B) demonstrate that Myc levels induced by CSF1 return to the basal level at 12 h after media change. Our detailed in vitro protocol that contains these details has been published (Yabaji et al., STAR Protocols, 2022). We added new data demonstrating the ROS and LPO production at 6h of TNF stimulation, while the Ifnb1 mRNA super-induction occurred at 16 – 18 h, and edited the text to highlight these dynamics. The upregulation of Myc pathway in human samples does not necessarily mean the upregulation of Myc itself, it could be due to the dysregulation of downstream pathways. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. The detailed analysis of this cell populations in human patients is suggested by our findings but it is beyond the scope of this study.

      The reviewer’s comments also suggested that a summary of our findings was necessary. The main focus of our study was to untangle connections between oxidative stress and Ifnb1 superinduction. It revealed that Myc hyperactivity caused partial deficiency of antioxidant defense leading to type I interferon pathway hyperactivity that in turn amplifies lipid peroxidation, thus establishing a vicious cycle driving inflammatory tissue damage.

      Our laboratory worked on mechanisms of TB granuloma necrosis over more than two decades using genetic, molecular and immunological analyses in vitro and in vivo. It provided mechanistic basis for independent studies in other laboratories using our mouse model and further expanding our findings, thus supporting the reproducibility and robustness of our results and our lab’s expertise.

      Specific comments to the experiments and data:

      - Fig. 1E: Evaluation of differences in up- and downregulation between B6 and B6.Sst1S cells should highlight where these cells are within the heatmap, as it is only labelled with the clusters, or it should be depicted differently (in particular for cluster 1 and 2). Furthermore, a more simple labelling of the pathways would increase the readability of the data.

      For our scRNAseq data presentation, we used formats accepted by computational community. To clarify Fig.1E, we added labels above B6 and B6.Sst1S-specific clusters.

      - Fig. 2D, E: The staining legend is missing. For the quantification it is not clear what % total means. Is this based on the intensity or area? What do the dots represent in the bar chart? Is one data point pooled from several pictures? If not, the experiments need to be repeated, as three pictures might not be representative for evaluation.

      - Fig. 2E: Statistics comparing B6/ B6,SsT1S with TNF (different) is required: Absence of induction is not a proof for a difference!

      We included staining with NRF2-specific antibodies and performed area quantification per field using ImageJ to calculate the NRF2 total signal intensity per field. Each dot in the graph represents the average intensity of 3 fields in a representative experiment. The experiment was repeated 3 times. We included pairwise comparison of TNF-stimulated B6 and B6.Sst1S macrophages and updated the figure legend.

      - Fig. 3E: Positive and negative control need to be depicted in the figure (see legend).

      We have added the positive and negative controls for the determination of labile iron pool to the data in Fig. 3E and related Suppl. Fig. 3B and to Fig. 5D that also demonstrates labile iron determination.

      - Fig. 3I: A quantification by flow cytometry or total cell counts are important, as 6% cell death in cell culture is a very modest observation. Otherwise, confocal images of the quantification would be a good addition to judge the specificity of the viability staining.

      To validate the specificity of the viability staining method, we have provided fluorescent images as Suppl.Fig.3H. The main point of this experiment was to demonstrate a modest, but reproducible, increase in cell death in the sst1-mutant macrophages that suggested an IFNdependent oxidative damage. In our study, we did not focus on mechanisms of cell death, but on a state of chronic oxidative stress in the sst1 mutant live cells during TNF stimulation.

      - Fig. 3I, J: What does one dot represent?

      We performed this assay in 96 well format and each dot represent the % cell death in an individual well.

      - Fig. 3K,L: For the B6 BMDMs it seems that p-cJun is highly increased at 12h in (L), while it is not in (K). On the other hand, for the B6.Sst1S BMDMs it peaks at 24h in (K), while in (L) it seems to at 12h. According to the data in (L) it seems that p-cJun is rather earlier and stronger activated in B6 BMDMs and has a weakened but prolonged activation in the B6.Sst1S BMDMs, which would not fit with your statement in the text that B6.Sst1S BMDMs show an upregulation.

      These experiments need repetitions and quantification and statistiscs.

      Fig. 3L: ASK1 seems to be higher at 12h for the B6 BMDMs and similar for both lines at 24h, which is not fitting to the statement in the text. ("Also, the ASK1 - JNK - cJun stress kinase axis was upregulated in B6.Sst1S macrophages, as compared to B6, after 12 - 36 h of TNF stimulation")

      These experiments were repeated, and new data were added to highlight differences in ASK1 and c-Jun phosphorylation between B6 and B6.Sst1S at individual timepoints after TNF stimulation (presented in new Fig.3K). It demonstrated that after TNF stimulation the activation of stress kinases ASK1 and c-Jun initially increased in both genetic backgrounds. However, their upregulation was maintained exclusively in the sst1-susceptible macrophages from 24 to 36 h of TNF stimulation, while in the resistant macrophages their upregulation was transient. Thus, during prolonged TNF stimulation, B6.Sst1S macrophages experience stress that cannot be resolved, as evidenced by this kinetic analysis. The quantification of the band intensity was added to Western blot images above individual lanes.

      Reviewer 2 pointed to missing isotype control antibodies in Fig.3 and Fig.4:

      - Figure 3J: the isotype control for the IFNAR antibody is missing

      - Figure 4E: It seems the isotype control itself has already an effect in the reduction of IFNb.

      - Fig. 4H: It seems that the Isotype control antibody had an effect to increase 4-HNE (compared to TNF stimulated only).

      We always include isotype control antibodies in our experiments because antibodies are known to modulate macrophage activation via binding to Fc receptor. To address the reviewer’s comments, we updated all panels that present the effects of IFNAR1 blockade with isotypematched non-specific control antibodies in the revised manuscript. Specifically, we included isotype control in Fig. 3M (previously Fig.3J), Fig.4I, Suppl.4E-G, Fig.6L-M), Suppl.Fig.7I (previously Suppl.Fig.6F).

      - Fig.4A - C: "IFNAR1 blockade, however, did not increase either the NRF2 and FTL protein levels, or the Fth, Ftl and Gpx1 mRNA levels above those treated with isotype control antibodies"

      Maybe not above the isotype but it is higher than the TNF alone stimulation at least for NRF2 at 8h and for Ftl at both time points. Why does the isotype already cause stimulation/induction of the cells? !These experiments need repetitions and quantification and statistics!

      To determine specific effects of IFNAR blockade we compared effects of non-specific isotype control and IFNAR1-specific antibodies. In our experiments, the isotype control antibody modestly increased of Nrf2 and Ftl protein levels and the Fth and Ftl mRNA levels, but their effects were similar to the effect of IFNAR-specific antibody. The non-IFN -specific effects of antibodies, although are of potential biological significance, are modest in our model and their analysis is beyond the scope of this study.

      - Fig.4H Was the AB added also at 12h post stimulation? Figure legend should be adjusted.

      The IFNAR1 blocking antibodies and isotype control antibodies were added at 2 h after TNF stimulation in Fig.4H and 4I, as described in the corresponding figure legend. The data demonstrating effects of IFNAR blockade after 12, 24,and 33h of TNF stimulation are presented in Suppl.Fig.4 E-G.

      - Figure 4I: How was the data measured here, i.e. what is depicted? The isotype control is missing. It seems a two-way ANOVA was used, yet it is stated differently. The figure legend should be revised, as Dunnett's multiple comparison would only check for significances compared to the control.

      The microscopy images and bar graphs were updated to include isotype control and presented in Suppl. Fig.4E - G of the revised version. We also revised the statistical analysis to include correction for multiple comparisons.

      - Figure 4C and subsequent: How exactly was the experiment done (house-keeping gene)?

      We included the details in the figure legends of revised version. We quantified the gene expression by DDCt method using b-actin (for Fig. 4C-E) and 18S (For Fig. 4F and G) as internal controls.

      - Figure 4D,E: Information on cells used is missing. Why the change in stimulation time? Did it not work after 12h? Then the experiments in A-C should be repeated for 16h.

      The updated Fig. 4D and E present comparison of B6 and B6.Sst1S BMDMs clearly demonstrating significant difference between these macrophages in Ifnb1 mRNA expression 16 h after TNF stimulation, in agreement with our previous publication(Bhattacharya, et al., 2021). There we studied the time course of responses of B6 and B6.Sst1S macrophages to TNF at 2h intervals and demonstrated the divergence between their activation trajectories starting at 12 h of TNF stimulation Therefore, to reveal the underlying mechanisms we focus our analyses on this critical timepoint, i.e. as close to the divergence as possible. However, the difference between the strains in Ifnb1 mRNA expression achieved significance only by 16h of TNF stimulation. That is why we have used this timepoint for the Ifnb1 and Rsad2 analyses. It clearly shows that the superinduction was not driven by the positive feedback via IFNAR, as has been shown by the Ivashkiv lab for B6 wild type macrophages previously PMID 21220349.

      - Figure 4E: It would be helpful to see if these transcripts are actually translated into protein levels, e.g. perform an ELISA. Authors state that IFNAR blockages does not alter the expression but you statistic says otherwise.

      - The data for Ifnb expression (or better protein level) should be provided for B6 BMDMs as well.

      We have previously reported the differences in Ifnb protein secretion (He et al., Plos Pathogens, 2013 and Bhattacharya et al., JCI 2021). We use mRNA quantification by qRT-PCR as a more sensitive and direct measurement of the sst1-mediated phenotype. The revised Fig.4D and E include responses of B6 in addition to the B6.Sst1S to demonstrate that the IFNAR blockade does not reduce the Ifnb1 mRNA levels in TNF-stimulated B6.Sst1S mutant to the B6 wild type levels. A slight reduction can be explained by a known positive feedback loop in the IFN-I pathway (see above). In this experiment we emphasized that the effect of the sst1 locus is substantially greater, as compared to the effect of the IFNAR blockade (Fig.4D), and updated the text accordingly.

      - Fig. 4F: To what does the fold induction refer to? If it is again to unstimulated cells, then why is the induction now so much higher than in (E) where it was only 50x (now to 100x).

      - Figure 4G: Again to what is the fold induction referring to? It seems your Fer-1 treatment only contains 2 data points. This needs to be fixed.

      Yes, the fold induction was calculated by normalizing mRNA levels to untreated control incubated for the same time. Regarding the variation in Ifnb1 mRNA levels - a two-fold variation is not unusual in these experiments that may result in the Ifnb1 mRNA superinduction ranging from 50 -200-fold at this timepoint (16h). The graph in Fig.4G was modified to make all datapoints more visible.

      - "These data suggest that type I IFN signaling does not initiate LPO in our model but maintains and amplifies it during prolonged TNF stimulation that, eventually, may lead to cell death". Data for a short term TNF stimulation are not shown, however, so it might impact also on the initiation of LPO.

      - The overall conclusion drawn from Fig. 3 and 4 is not really clear with regard that IFN does not initiate LPO. Where is that shown? Data on earlier stimulation time points should be added to make this clear.

      We demonstrated ROS production (new Suppl.Fig.3G) and the rate of LPO biosynthesis (new Suppl.Fig.4E-F) at 6 h post TNF stimulation, while the Ifnb1 superinduction occurs between 12-18 h post TNF stimulation. This temporal separation supports our conclusion that IFN-β superinduction does not initiate LPO. We clarified it in the text:

      “Thus, Ifnb1 super-induction and IFN-I pathway hyperactivity in B6.Sst1S macrophages follow the initial LPO production, and maintain and amplify it during prolonged TNF stimulation”. (Previously: These data suggest that type I IFN signaling does not initiate LPO in our model). We also edited the conclusion in this section to explain the hierarchy of the sst1-regulated AOD and IFN-I pathways better:

      “Taken together, the above experiments allowed us to reject the hypothesis that IFN-I hyperactivity caused the sst1-dependent AOD dysregulation. In contrast, they established that the hyperactivity of the IFN-I pathway in TNF-stimulated B6.Sst1S macrophages was itself driven by the initial dysregulation of AOD and iron-mediated lipid peroxidation. During prolonged TNF stimulation, however, the IFN-I pathway was upregulated, possibly via ROS/LPOdependent JNK activation, and acted as a potent amplifier of lipid peroxidation”.

      We believe that these additional data and explanation strengthen our conclusions drawn from Figures 3 and 4.

      - "A select set of mouse LTR-containing endogenous retroviruses (ERV's) (Jayewickreme et al, 2021), and non-retroviral LINE L1 elements were expressed at a basal level before and after TNF stimulation, but their levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6". This sentence should be revised as the differences between B6 and B6.Sst1S BMDMs seem small and are not there after 48h anymore. Are these mild changes really caused by the mutation or could they result from different housing conditions and/or slowly diverging genetically lines. How many mice were used for the analysis? Is there already heterogeneity between mice from the same line?

      We agree with the reviewer that the data presented in Suppl.Fig.4 (Suppl.Fig.5 in the revised version) indicated no increase in single- and double-stranded transposon RNAs in the B6.Sst1S macrophages. The purpose of these experiment was to test the hypothesis that increased transposon expression might be responsible for triggering the superinduction of type I interferon response in TNF-stimulated B6.Sst1S macrophages. In collaboration with a transposon expert Dr. Nelson Lau (co-author of this manuscript) we demonstrated that transposon expression was not increased above the B6 level and, thus, rejected this attractive hypothesis. We explained the purpose of this experiment in the text and adequately described our findings as “the levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6”…and concluded that ” the above analyses allowed us to exclude the overexpression of persistent viral or transposon RNAs as a primary mechanism of the IFN-I pathway hyperactivity” in the sst1-mutant macrophages.

      - Fig. 5A: Indeed, it even seems that Myc is upregulated for the mutant BMDMs. Yet, there are only 2 data points for B6 12h.

      These experiments need repetitions and quantification and statistics.

      We observed these differences in c-Myc mRNA levels by independent methods: RNAseq and qRT-PCR. The qRT-PCR experiments were repeated 3 times. A representative experiment in Fig.5A shows 3 data points for each condition. We reformatted the panel to make all data points clearly visible.

      - Fig. 5B: Why would the protein level decrease in the controls over 6h of additional cultivation? Is this caused by fresh M-CSF? In this case maybe cells should be left to settle for one day before stimulating them to properly compare c-Myc induction. Comment on two c-Myc bands is needed. At 12h only the upper one seems increased for TNF stimulated mutant BMDMs compared to B6 BMDMs.

      We agree with the reviewer’s point that cells need to be rested after media change that contains fresh CSF-1. Indeed, in Suppl.Fig.6C, we show that after media change containing 10% L929 supernatant (a source of CSF1) there is an increase in c-Myc protein levels that takes approximately 12 hours to return to baseline.

      Our protocol includes resting period of 18-24 h after medium change before TNF stimulation.

      We updated Methods to highlight this detail. Thus, the increase in c-Myc levels we observe at 12 h of TNF stimulation (Fig.5B) is induced by TNF, not the addition of growth factors, as further discussed in the text.

      The two c-Myc bands observed in Fig.5B,I and J, are similar to patterns reported in previous studies that used the same commercial antibodies (PMIDs: 24395249, 24137534, 25351955). Whether they correspond to different c-Myc isoforms or post-translational modifications is unknown.

      - Fig. 5A,B: It seems that not all the RNA is translated into protein, as c-Myc at 12h in the mutant BMDMs seems to be lower than at 6h, while the gene expression implicates it vice versa.

      In addition to Fig.5B, the time course of Myc protein expression up to 24 h is presented in new panels Fig. 5I-5J. It demonstrates the gradual decrease of Myc protein levels. The observed dissociation between the mRNA and protein levels in the sst1-mutant BMDMs at 12 and 24 h is most likely due to translation inhibition as a result of the development of the integrated stress response, ISR (as shown in our previous publication by Bhattacharya et al., JCI, 2021). Translation of Myc is known to be particularly sensitive to the ISR (PMID18551192, PMID25079319, PMID28490664). Perhaps, the IFN-driven ISR may serve as a backup mechanism for Myc downregulation. We are planning to investigate these regulatory mechanisms in greater detail in the future.

      - Fig. 5J: Indeed, the inhibitor seems to cause the downregulation of the proteins. Explanation?

      This experiment was repeated twice and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as had been previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we rejected the hypotghesis that JNK activity might have a major role in c-Myc upregulation in sst1 mutant macrophages.

      - "TNF stimulation tended to reduce the LPO accumulation in the B6 macrophages and to increase it in the B6.Sst1S ones" However, this is not apparent in Sup. Fig. 6B. Here it seems that there might be a significant increase.

      Suppl.Fig.6B (currently Suppl.Fig.7B) shows the 4-HNE accumulation at day 3 post infection. The data obtained after 5 days of Mtb infection are shown in Fig.6A. We clarified this in the text: “By day 5 post infection, TNF stimulation induced significant LPO accumulation only in the B6.Sst1S macrophages (Fig.6A)”.

      - Fig. 6B: Mtb and 4-HNE should be shown in two different channels in order to really assign each staining correctly.

      What time point is this? Are the mycobacteria cleared at MOI1, since it looks that there are fewer than that? How does this look like for the B6 BMDMs? Are there even less mycobacteria?

      We included B6 infection data to the updated Fig.6B and added Suppl.Fig.7C and 7D that address this reviewer’s comment. The data represent day 5 of Mtb infection as indicated in the updated Fig.6B and Suppl.Fig.7C and 7D legends. New Suppl.Fig.7D shows quantification of replicating Mtb using Mtb replication reporter stain expressing single strand DNA binding protein GFP fusion, as described in Methods. We observed fewer Mtb and a lower percentage of replicating Mtb in B6 macrophages, but we did not observe a complete Mtb elimination in either background.

      We used red fluorescence for both Mtb::mCherry and 4-HNE staining to clearly visualize the SSB-GFP puncta in replicating Mtb DNA. In the revised manuscript, we have included the relevant channels in Suppl. Fig.7C and D to demonstrate clearly distinct patterns of Mtb::mCherry and 4-HNE signals. We did not aim to quantify the 4-HNE signal intensity in this experiment. For the 4-HNE quantification we use Mtb that expressed no reporter proteins (Fig.6A-B and Suppl.Fig.7A-B).

      - Fig 6E: In the context of survival a viability staining needs to be included, as well as the data from day 0. Then it needs to be analyzed whether cell numbers remain the same from D0 or if there is a change.

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death to exclude artifacts due to cell loss. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      "The 3D imaging demonstrated that YFP-positive cells were restricted to the lesions, but did not strictly co-localize with intracellular Mtb, i.e. the Ifnb promoter activity was triggered by inflammatory stimuli, but not by the direct recognition of intracellular bacteria. We validated the IFNb reporter findings using in situ hybridization with the Ifnb probe, as well as anti-GFP antibody staining (Suppl.Fig.8B - E)." The colocalization is not present within the tissue sections. It seems that the reporter line does not show the same staining pattern in vivo as the IFNß probe or the anti GFP antibody staining. The reporter line has to be tested for the specificity of the staining. Furthermore, to state that it was restricted to the lesions, an uninvolved tissue area needs to be depicted.

      The Ifnb secreting cells are notoriously difficult to detect in vivo using direct staining of the protein. Therefore, lineage tracing of reporter expression are used as surrogates. The Ifnb reporter used in our study has been developed by the Locksley laboratory (Scheu et al., PNAS, 2008, PMID: 19088190) and has been validated in many independent studies. The reporter mice express the YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells, while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. Also, the kinetics of YFP protein degradation is much slower as compared to the endogenous Ifnb1 mRNA that was detected using in situ hybridization. Thus, there is no precise spatiotemporal coincidence of these readouts in Ifnb expressing cells in vivo. However, this methodology more closely reflect the Ifnb expressing cells in vivo, as compared to a Cre-lox mediated lineage tracing approach. In the revised manuscript we demonstrate that both YFP and mRNA signals partially overlap (Suppl.Fig.12B). In Suppl.Fig.12B. we also included a new panel showing no YFP expression in the uninvolved area of the reporter mice infected with Mtb. The YFP expression by activated macrophages is demonstrated by co-staining with Iba1- and iNOS-specific antibodies (new Fig.7D and Suppl.Fig.13A). Our specificity control also included TB lesions in mice that do not carry the YFP reporter and did not express the YFP signal, as reported elsewhere (Yabaji et al., BioRxiv, https://doi.org/10.1101/2023.10.17.562695).

      - Are paucibacillary and multibacillary lesions different within the same animal or does one animal have one lesion phenotype? If that is the case, what is causing the differences between mice? Bacterial counts for the mice are required.

      The heterogeneity of pulmonary TB lesions has been widely acknowledged in clinic and highlighted in recent experimental studies. In our model of chronic pulmonary TB (described in detail in Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) the development of pulmonary TB lesions is not synchronized, i.e. the lesions are heterogeneous between the animals and within individual animals at the same timepoint. Therefore, we performed a lesion stratification where individual lesions were classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8.

      - "Among the IFN-inducible genes upregulated in paucibacillary lesions were Ifi44l, a recently described negative regulator of IFN-I that enhances control of Mtb in human macrophages (DeDiego et al, 2019; Jiang et al, 2021) and Ciita, a regulator of MHC class II inducible by IFNy, but not IFN-I (Suppl.Table 8 and Suppl.Fig.10 D-E)." Why is Sup. Fig. 10 D, E referred to? The figure legend is also not clear, e.g. what means "upregulated in a subset of IFN-inducible genes"? Input for the hallmarks needs to be defined.

      These data is now presented in Suppl.Fig.11 and following the reviewer’s comment, we moved reference to panels 11D – E up to previous paragraph in the main text, where it naturally belongs . We also edited the figure legend to refer to the list of IFN-inducible genes compiled from the literature that is discussed in the text. We appreciate the reviewer’s suggestion that helped us improve the text clarity. The inputs for the Hallmark pathway analysis are presented in Suppl.Tables 7 and 8, as described in the text.

      - Fig. 7C: Single channel pictures are required as it is hard to see the differences in staining with so many markers. Why is there no iNOS expression in the bottom row? What does the rectangle indicate on the bottom right? As black is chosen for DAPI, it is not visible at all. In case the signal is needed a visible a color should be chosen.

      We thoroughly revised this figure to address the reviewer’s concern about the lack of clarity. We provide individual channels for each marker in Fig.7D – E and Suppl.Fig.13F. We have to use DAPI in these presentation in gray scale to better visualize other markers.

      - "In the advanced lesions these markers were primarily expressed by activated macrophages (Iba1+) expressing iNOS and/or Ifny (YFP+)(Fig.7D)" Iba1 is needed in the quantification. Based on the images, iNOS seems to be highly produced in Iba1 negative cells. Which cells do produce it then? Flow cytometry data for this quantification are required. This would allow you to specifically check which cells express the markers and allow for a more precise analysis of double positive cells.

      Currently these data demonstrating the co-localization of stress markers phospho-c-Jun and Chac1 with YFP are presented in Fig.7E (images) and Suppl.Fig.13D (quantification). The co-localization of stress markers phospho-cJun and Chac1 with iNOS is presented in Suppl.Fig.13F (images) and Suppl.Fig.13E (quantification). We agree that some iNOS+ cells are Iba1-negative (Fig.7D). We manually quantified percentages of Iba1+iNOS+ double positive cells and demonstrated that they represent the majority of the iNOS+ population(Suppl.Fig.13A). Regarding the required FACS analysis, we focus on spatial approaches because of the heterogeneity of the lesions that would be lost if lungs are dissociated for FACS. We are working on spatial transcriptomics at a single cell resolution that preserves spatial organization of TB lesions to address the reviewer’s comment and will present our results in the future.

      - Results part 6: In general, can you please state for each experiment at what time point mice were analyzed? You should include an additional macrophage staining (e.g. MerTK, F4/80), as alveolar macrophages are not staining well for Iba1 and you might therefore miss them in your IF microscopy. It would be very nice if you could perform flow cytometry to really check on the macrophages during infection and distinguish subsets (e.g. alveolar macrophages, interstitial macrophages, monocytes).

      We have included the details of time post infection in figure legends for Fig.7, Suppl.Figures 8, 9, 12B, 13, 14A of the revised manuscript. We have performed staining with CD11b, CD206 and CD163 to differentiate the recruited and lung resident macrophages and determined that in chronic pulmonary TB lesions in our model the vast majority of macrophages are recruited CD11b+, but not resident (CD206+ and CD163+) macrophages. These data is presented in another manuscript (Yabaji et al., BioRxiv https://doi.org/10.1101/2023.10.17.562695).

      - Spatial sequencing: The manuscript would highly profit from more data on that. It would be very interesting to check for the DEGs and show differential spatial distribution. Expression of marker genes should be inferred to further define macrophage subsets (e.g. alveolar macrophages, interstitial macrophages, recruited macrophages) and see if these subsets behave differently within the same lesion but also between the lesions. Additional bioinformatic approaches might allow you to investigate cell-cell interactions. There is a lot of potential with such a dataset, especially from TB lesions, that would elevate your findings and prove interesting to the TB field.

      - "Thus, progression from the Mtb-controlling paucibacillary to non-controlling multibacillary TB lesions in the lungs of TB susceptible mice was mechanistically linked with a pathological state of macrophage activation characterized by escalating stress (as evidenced by the upregulation phospho-cJUN, PKR and Chac1), the upregulation of IFNβ and the IFN-I pathway hyperactivity, with a concurrent reduction of IFNγ responses." To really show the upregulation within macrophages and their activation, a more detailed IF microscopy with the inclusion of additional macrophage markers needs to be provided. Flow cytometry would enable analysis for the differences between alveolar and interstitial macrophages, as well as for monocytes. As however, it seems that the majority of iNOS, as well as the stress associated markers are not produced by Iba1+ cells. Analyzing granulocytes and T lymphocytes should be considered.

      We appreciate the reviewer’s suggestion. Indeed, our model provides an excellent opportunity to investigate macrophage heterogeneity and cell interactions within chronic TB lesions. We are working on spatial transcriptomics at a single cell resolution that would address the reviewer’s comment and will present our results in the future.

      In agreement with classical literature the overwhelming majority of myeloid cells in chronic pulmonary TB lesions is represented by macrophages. Neutrophils are detected at the necrotic stage, but our study is focused on pre-necrotic stages to reveal the earlier mechanisms predisposing to the necrotization. We never observed neutrophils or T cells expressing iNOS in our studies.

      - It's mentioned in the method section that controls in the IF staining were only fixed for 10min, while the infected cells were fixed for 30min. Consistency is important as the PFA fixation might impact on the fluorescence signal. Therefore, controls should be repeated with the same fixation time.

      We have carefully considered the impact of fixation time on fluorescence and have separately analyzed the non-infected and infected samples to address this concern. For the non-infected samples, we examined the effect of TNF in both B6 and B6.Sst1S backgrounds, ensuring that a consistent fixation protocol (10 min) was applied across all experiments without Mtb infection.

      For the Mtb infection experiments, we employed an optimized fixation protocol (30 min) to ensure that Mtb was killed before handling the plates, which is critical for preserving the integrity of the samples. In this context, we compared B6 and B6.Sst1S samples to evaluate the effects of fixation and Mtb infection on lipid peroxidation (LPO) induction.

      We believe this approach balances the need for experimental consistency with the specific requirements for handling infected cells, and we have revised the manuscript to reflect this clarification.

      - Reactive oxygen species levels should be determined in B6 and B6.Sst1S BMDMs (stimulated and unstimulated), as they are very important for oxidative stress.

      We have conducted experiments to measure ROS production in both B6 and B6.Sst1S BMDMs and demonstrated higher levels of ROS in the susceptible BMDMs after prolonged TNF stimulation (new Fig.3I-J and Suppl. Fig. 3G). Additionally, we have previously published a comparison of ROS production between B6 and B6.Sst1S by FACS (PMID: 33301427), which also supports the findings presented here.

      - Sup. Fig 2C: The inclusion of an unstimulated control would be advisable in order to evaluate if there are already difference in the beginning.

      We have included the untreated control to the Suppl. Fig. 2C (currently Suppl. Fig. 2D) in the revised manuscript.

      - Sup. Fig. 3F: Why is the fold change now lower than in Fig. 4D (fold change of around 28 compared to 120 in 4D)?

      The data in Fig.4D (Fig.4E in the revised manuscript) and Suppl.Fig.3F (currently Suppl.Fig.4C) represent separate experiments and this variation between experiments is commonly observed in qRT-PCR that is affected by slight variations in the expression in unsimulated controls used for the normalization and the kinetics of the response. This 2-4 fold difference between same treatments in separate experiments, as compared to 30 – 100 fold and higher induction by TNF does not affect the data interpretation.

      - Sup. Fig. 5C, D: The data seems very interesting as you even observe an increase in gene expression. Data for the B6 mice should be evaluated for increase to a similar level as the TNF treated mutants. Data on the viability of the cells are necessary, as they no longer receive MCSF and might be dying at this point already.

      To ensure that the observed effects were not confounded by cytotoxicity, we determined non-toxic concentrations of the CSF1R inhibitors during 48h of incubation and used them in our experiments that lasted for 24h. To address this valid comment, we have included cell viability data in the revised manuscript to confirm that the treatments did not result in cell death (Suppl. Fig. 6D). This experiment rejected our hypothesis that CSF1 driven Myc expression could be involved in the Ifnb superinduction. Other effects of CSF1R inhibitors on type I IFN pathway are intriguing but are beyond the scope of this study.

      - Sup. Fig 12: the phospho-c-Jun picture for (P) is not the same as in the merged one with Iba1. Double positive cells are mentioned to be analyzed, but from the staining it appears that P-c-Jun is expressed by other cells. You do not indicate how many replicates were counted and if the P and M lesions were evaluated within the same animal. What does the error bar indicate? It seems unlikely from the plots that the double positive cells are significant. Please provide the p values and statistical analysis.

      We thank the reviewer for bringing this inadvertent field replacement in the single phospho-cJun channel to our attention. However, the quantification of Iba1+phospho-cJun+ double positive cells in Suppl.Fig.12 and our conclusions were not affected. In the revised manuscript, images and quantification of phospho-cJun and Iba1 co-expression are shown in new Suppl.Fig.13B and C, respectively. We have also updated the figure legends to denote the number of lesions analyzed and statistical tests. Specifically, lesions from 6–8 mice per group (paucibacillary and multibacillary) were evaluated. Each dot in panels Suppl.Fig.13 represent individual lesions.

      - Sup. Fig. 13D (suppl.Fig.15D now): What about the expression of MYC itself? Other parts of the signaling pathway should be analyzed(e.g. IFNb, JNK)?

      The difference in MYC mRNA expression tended to be higher in TB patients with poor outcomes, but it was not statistically significant after correction for multiple testing. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (possibly indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice.

      - In the mfIHC you he usage of anti-mouse antibodies is mentioned. Pictures of sections incubated with the secondary antibody alone are required to exclude the possibility that the staining is not specific. Especially, as this data is essential to the manuscript and mouse-antimouse antibodies are notorious for background noise.

      We are well aware of the technical difficulties associated with using mouse on mouse staining. In those cases, we use rabbit anti-mouse isotype specific antibodies specifically developed to avoid non-specific background (Abcam cat#ab133469). Each antibody panel for fluorescent multiplexed IHC is carefully optimized prior to studies. We did not use any primary mouse antibodies in the final version of the manuscript and, hence, removed this mention from the Methods.

      - In order to tie the story together, it would be interesting to treat infected mice with an INFAR antibody, as well as perform this experiment with a Myc antibody. According to your data, you might expect the survival of the mice to be increased or bacterial loads to be affected.

      In collaboration with the Vance laboratory, we tested effects of type I IFN pathway inhibition in B6.Sst1S mice on TB susceptibility: either type I receptor knockout or blocking antibodies increased their resistance to virulent Mtb (published in Ji et al., 2019; PMID 31611644). Unfortunately, blocking Myc using neutralizing antibodies in vivo is not currently achievable. Specifically blocking Myc using small molecule inhibitors in vivo is notoriously difficult, as recognized in oncology literature. We consider using small molecule inhibitors of either Myc translation or specific pathways downstream of Myc in the future.

      - It is surprising that you not even once cite or mention your previous study on bioRxiv considering the similarity of the results and topic (https://doi.org/10.1101/2020.12.14.422743). Is not even your Figure 1I and Figure 2 J, K the same as in that study depicted in Figure 4?

      The reviewer refers to the first version of this manuscript uploaded to BioRxiv, but it has never been published. We continued this work and greatly expanded our original observations, as presented in the current manuscript. Therefore, we do not consider the previous version as an independent manuscript and, therefore, do not cite it.

      - Please revise spelling of the manuscript and pay attention to write gene names in italics

      Thank you, we corrected the gene and protein names according to current nomenclature.

      Minor points:

      - Fig. 1: Please provide some DEGs that explain why you used this resolution for the clustering of the scRNAseq data and that these clusters are truly distinct from each other.

      Differential gene expression in clusters is presented in Suppl.Fig.1C (interferon response) and Suppl.Fig.1D (stress markers and interferon response previously established in our studies).

      - Fig. 1F: What do the two lines represent (magenta, green)?

      The lines indicate pseudotime trajectories of B6 (magenta) and B6.Sst1S (green) BMDMs.

      - Fig. 1F, G: Why was cluster 6 excluded?

      This cluster was not different between B6 and B6.Sst1S, so it was not useful for drawing the strain-specific trajectories.

      - Fig. 1E, G, H: The intensity scales are missing. They are vital to understand the data.

      We have included the scale in revised manuscript (Fig.1E,G,H and Suppl.Fig.1C-D).

      - Fig. 2G-I: please revise order, as you first refer to Fig. 2H and I

      We revised the panels’ order accordingly

      - Fig. 5: You say the data represents three samples but at least in D and E you have more. Please revise. Why do you only include at (G) the inhibitor only control?

      We added the inhibitor only controls to Fig. 5D - H. We also indicated the number of replicates in the updated Fig.5 legend.

      - Figure 7A, Sup. Fig. 8: Are these maximum intensity projection? Or is one z-level from the 3D stack depicted?

      The Fig. 7A shows 3D images with all the stacks combined.

      - Fig. 7B: What do the white boxes indicate?

      We have removed this panel in the revised version and replaced it with better images.

      - Sup. Fig. 1A: The legend for the staining is missing

      The Suppl. Fig.1A shows the relative proportions of either naïve (R and S) or TNFstimulated (RT and ST) B6 or B6.Sst1S macrophages within individual single cell clusters depicted in Fig.1B. The color code is shown next to the graph on the right.

      - Sup. Fig. 1B: The feature plots are not clear: The legend for the expression levels is missing. What does the heading means?

      We updated the headings, as in Fig.1C. The dots represent individual cells expressing Sp110 mRNA (upper panels) and Sp140 mRNA (lower panels).

      - Sup. Fig. 3C: The scale bar is barely visible.

      We resized the scale bar to make it visible and presented in Suppl. Fig.3E (previously Suppl. Fig.3C).

      - Sup. Fig. 3D: There is not figure legend or the legend to C-E is wrong.

      - Sup. Fig. 3F, G: You do not state to what the data is relative to.

      We identified an error in the Suppl.Fig.3 legend referring to specific panels. The Suppl.Fig.3 legend has been updated accordingly. New panels were added and Suppl.Fig.3-G panels are now Suppl.Fig.4C-D.

      - Sup. Fig. 3H: It seems you used a two-way ANOVA, yet state it differently. Please revise the figure legend, as Dunnett's multiple comparison would only check for significances compared to the control.

      Following the reviewer’s comment, we repeated statistical analysis to include correction for multiple comparisons and revised the figure and legend accordingly.

      - Sup. Fig. 4A, B: It is not clear what the lines depict as the legend is not explained. Names that are not required should be changed to make it clear what is depicted (e.g. "TE@" what does this refer to?)

      This previous Sup. Fig 4 is now Sup. Fig. 5. The “TE@” is a leftover label from the bioinformatics pipeline, referring to “Transposable Element”. We apologize for this confusion and have removed these extraneous labels. We have also added transposon names of the LTR (MMLV30 and RTLV4) and L1Md to Suppl.Fig.5A and 5B legend, respectively.

      - Sup. 4B: What does the y-scale on the right refer to?

      We apologize for the missing label for the y-scale on the right which represents the mRNA expression level for the SetDB1 gene, which has a much lower steady state level than the LINE L1Md, so we plotted two Y-scales to allow both the gene and transposon to be visualized on this graph.

      - Sup. 4C: Interpretation of the data is highly hindered by the fact that the scales differ between the B6 and B6.Sst1. The scales are barely visible.

      We apologize for the missing labels for the y-scales of these coverage plots, which were originally meant to just show a qualitative picture of the small RNA sequencing that was already quantitated by the total amounts in Sup. 4B. We have added thee auto-scaled Y-scales to Sup. 4C and improved the presentation of this figure.

      - Sup. Fig. 5A, B: Is the legend correct? Did you add the antibody for 2 days or is the quantification from day 3?

      We recognize that the reviewer refers to Suppl.Fig.6A-B (Suppl.Fig.7A-B in the revised manuscript). We did not add antibodies to live cells. The figure legend describes staining with 4HNE-specific antibodies 3 days post Mtb infection.

      - Sup. Fig. 8A: Are the "early" and "intermediate" lesions from the same time points? What are the definitions for these stages?

      We discussed our lesion classification according to histopathology and bacterial loads above. Of note, in the revised manuscript we simplified our classification to denote paucibacillary and multibacillary lesions only. We agree with reviewers that designation lesions as early, intermediate and advanced lesions were based on our assumptions regarding the time course of their progression from low to high bacterial loads.

      - Sup. Fig. 8E: You should state that the bottom picture is an enlargement of an area in the top one. Scale bars are missing.

      We replaced this panel with clearer images in Suppl.Fig.12B.

      - Sup. Fig. 11A: The IF staining is only visible for Iba and iNOS. Please provide single channels in order to make the other staining visible.

      Suppl.Fig.11A (now Suppl.Fig.13B) shows the low-magnification images of TB lesions. In the Fig. 7 and Suppl. Fig. 13F of the revised manuscript we provided images for individual markers.

      - Sup. Fig. 13A (Suppl.Fig.15A now): Your axis label is not clear. What do the numbers behind the genes indicate? Why did you choose oncogene signatures and not inflammatory markers to check for a correlation with disease outcome?

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set.

      - Sup. 13D(Suppl.Fig.15D now): Maybe you could reorder the patients, so that the impression is clearer, as right now only the top genes seem to show a diverging gene signature, while the rest gives the impression of an equal distribution.

      The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      - The scale bars for many microscopy pictures are missing.

      We have included clearly visible scale bars to all the microscopy images in the revised version.

      - The black bar plots should be changed (e.g. in color), since the single data points cannot be seen otherwise.

      - It would be advisable that a consistent color scheme would be used throughout the manuscript to make it easier to identify similar conditions, as otherwise many different colours are not required and lead right now rather to confusion (e.g. sometimes a black bar refers to BMDMs with and sometimes without TNF stimulation, or B6 BMDMs). Furthermore, plot sizes and fonts should be consistent within the manuscript (including the supplemental data)

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Within the methods section:

      - At which concentration did you use the IFNAR antibody and the isotype?

      We updated method section by including respective concentrations in the revised manuscript.

      - Were mice maintained under SPF conditions? At what age where they used?

      Yes, the mice are specific pathogen free. We used 10 - 14 week old mice for Mtb infection.

      - The BMDM cultivation is not clear. According to your cited paper you use LCCM but can you provide how much M-CSF it contains? How do you make sure that amounts are the same between experiments and do not vary? You do not mention how you actually obtain this conditioned medium. Is there the possibility of contamination or transferred fibroblasts that would impact on the data analysis? Is LCCM also added during stimulation and inhibitor treatment?

      We obtain LCCM by collecting the supernatant from L929 cell line that form confluent monolayer according to well-established protocols for LCCM collection. The supernatants are filtered through 0.22 micron filters to exclude contamination with L929 cells and bacteria. The medium is prepared in 500 ml batches that are sufficient for multiples experiments. Each batch of L929-conditioned medium is tested for biological activity using serial dilutions.

      - How was the BCG infection performed? How much bacteria did you use? Which BCG strain was used?

      We infected mice with M. bovis BCG Pasteur subcutaneously in the hock using 10<sup>6</sup> CFU per mouse.

      - At what density did you seed the BMDMs for stimulation and inhibitor experiments?

      In 96 well plates, we seed 12,000 cells per well and allow the cells to grow for 4 days to reach confluency (approximately 50,000 cells per well). For a 6-well plate, we seed 2.5 × 10<sup>5</sup> cells per well and culture them for 4 days to reach confluency. For a 24-well plate, we seed 50,000 cells per well and keep the cells in media for 4 days before starting any treatments. This ensures that the cells are in a proliferative or near-confluent state before beginning the stimulation or inhibitor treatments. Our detailed protocol is published in STAR Protocols (Yabaji et al., 2022; PMID 35310069).

      - What machine did you use to perform the bulk RNA sequencing? How many replicates did you include for the sequencing?

      For bulk sequencing we used 3 RNA samples for each condition. The samples were sequenced at Boston University Microarray & Sequencing Resource service using Illumina NextSeq<sup>TM</sup> 2000 instrument.

      - How many replicates were used for the scRNA sequencing? Why is your threshold for the exclusion of mitochondrial DNA so high? A typical threshold of less than 5% has been reported to work well with mouse tissue.

      We used one sample per condition. For the mitochondrial cutoff, we usually base it off of the total distribution. There is no "universal" threshold that can be applied to all datasets. Thresholds must be determined empirically.

      - You do not mention how many PCAs were considered for the scRNA sequencing analysis.

      We considered 50 PCAs, this information was added to Methods

      - You should name all the package versions you used for the scRNA sequencing (e.g. for the slingshot, VAM package)

      The following package versions were used: Seurat v4.0.4, VAM v1.0.0, Slingshot v2.3.0, SingleCellTK v2.4.1, Celda v1.10.0, we added this information to Methods.

      - You mention two batches for the human samples. Can you specify what the two batches are?

      Human blood samples were collected at five sites, as described in the updated Methods section and two RNAseq batches were processed separately that required batch correction.

      - At which temperature was the IF staining performed?

      We performed the IF at 4oC. We included the details in revised version.

      Reviewer #2 (Significance):

      Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors use a mouse model designed to be more susceptible to M.tb (addition of sst1 locus) which has granulomatous lesions more similar to human granulomas, making this mouse highly relevant for M.tb pathogenesis studies. Using WT B6 macrophages or sst1B6 macrophages, the authors seek to understand the how the sst1 locus affects macrophage response to prolonged TNFa exposure, which can occur during a pro-inflammatory response in the lungs. Using single cell RNA-seq, revealed clusters of mutant macrophages with upregulated genes associated with oxidative stress responses and IFN-I signaling pathways when treated with TNF compared to WT macs. The authors go on to show that mutant macrophages have decreased NRF2, decreased antioxidant defense genes and less Sp110 and Sp140. Mutant macrophages are also more susceptible to lipid peroxidation and ironmediated oxidative stress. The IFN-I pathway hyperactivity is caused by the dysregulation of iron storage and antioxidant defense. These mutant macrophages are more susceptible to M.tb infection, showing they are less able to control bacterial growth even in the presence of T cells from BCG vaccinated mice. The transcription factor Myc is more highly expressed in mutant macs during TNF treatment and inhibition Myc led to better control of M.tb growth. Myc is also more abundant in PBMCs from M.tb infected humans with poor outcomes, suggesting that Myc should be further investigated as a target for host-directed therapies for tuberculosis.

      Major Comments

      Isotypes for IF imaging and confocal IF imaging are not listed, or not performed. It is a concern that the microscopy images throughout the manuscript do not have isotype controls for the primary antibodies.

      Fig 4 (and later) the anti-IFNAR Ab is used along with the Isotype antibody, Fig 4I does not show the isotype. Use of the isotype antibody is also missing in later figures as well as Fig 3J. Why was this left off as the proper control for the Ab?

      We addressed the comment in revised manuscript as described above in summary and responses to reviewers 1 and 2. Isotype controls for IFNAR1 blockade were included in Fig.3M (previously 3J), Fig. 4I, Suppl.Fig.4G (previously Fig.4I), and updated Fig.4C-E, Fig.6L-M, Suppl.Fig.4F-G, 7I.

      Conclusions drawn by the authors from some of the WB data are worded strongly, yet by eye the blots don't look as dramatically different as suggested. It would be very helpful to quantify the density of bands when making conclusions. (for example, Fig 4A).

      We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Fig 5A is not described clearly. If the gene expression is normalized to untreated B6 macs, then the level of untreated B6 macs should be 1. In the graph the blue bars are slightly below 1, which would not suggest that levels "initially increased and subsequently downregulated" as stated in the text. It seems like the text describes the protein expression but not the RNA expression. Please check this section and more clearly describe the results.

      We appreciate the reviewer’s comment and modified the text to specify the mRNA and protein expression data, as follows:

      “We observed that Myc was regulated in an sst1-dependent manner: in TNF-stimulated B6 wild type BMDMs, c-Myc mRNA was downregulated, while in the susceptible macrophages c-Myc mRNA was upregulated (Fig.5A). The c-Myc protein levels were also higher in the B6.Sst1S cells in unstimulated BMDMs and 6 – 12 h of TNF stimulation (Fig.5B)”.

      Also, why look at RNA through 24h but protein only through 12h? If c-myc transcripts continue to increase through 24h, it would be interesting to see if protein levels also increase at this later time point.

      The time-course of Myc expression up to 24 h is presented in new panels Fig. 5I-5J It demonstrates the decrease of Myc protein levels at 24 h. In the wild type B6 BMDMs the levels of Myc protein significantly decreased in parallel with the mRNA suppression presented in Fig.5A. In contrast , we observed the dissociation of the mRNA and protein levels in the _sst1_mutant BMDMs at 12 and 24 h, most likely, because the mutant macrophages develop integrated stress response (as shown in our previous publication by Bhattacharya et al., JCI, 2021) that is known to inhibit Myc mRNA translation.

      Fig 5J the bands look smaller after D-JNK1 treatment at 6 and 12h though in the text is says no change. Quantifying the bands here would be helpful to see if there really is no difference.

      This experiment was repeated twice, and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether the hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as was previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we concluded that JNK did not have a major role in c-Myc upregulation in this context.

      Section 4, third paragraph, the conclusion that JNK activation in mutant macs drives pathways downstream of Myc are not supported here. Are there data or other literature from the lab that supports this claim?

      This statement was based on evidence from available literature where JNK was shown to activate oncogens, including Myc. In addition, inhibition of Myc in our model upregulated ferritin (Fig.Fig.5C), reduced the labile iron pool, prevented the LPO accumulation (Fig.5D - G) and inhibited stress markers (Fig.5H). However, we do not have direct experimental evidence in our model that Myc inhibition reduces ASK1 and JNK activities. Hence, we removed this statement from the text and plan to investigate this in the future.

      Fig 6N Please provide further rationale for the BCG in vivo experiment. It is unclear what the hypothesis was for this experiment.

      In the current version BCG vaccination data is presented in Suppl.Fig.14B. We demonstrate that stressed BMDMs do not respond to activation by BCG-specific T cells (Fig.6J) and their unresponsiveness is mediated by type I interferon (Fig.6L and 6M). The observed accumulation of the stressed macrophages in pulmonary TB lesions of the sst1-susceptible mice (Fig.7E, Suppl.Fig.13 and 14A) and the upregulation of type I interferon pathway (Fig.1E,1G, 7C), Suppl.Fig.1C and 11) suggested that the effect of further boosting T lymphocytes using BCG in Mtb-infected mice will be neutralized due to the macrophage unresponsiveness. This experiment provides a novel insight explaining why BCG vaccine may not be efficient against pulmonary TB in susceptible hosts.

      The in vitro work is all concerning treatment with TNFa and how this exposure modifies the responses in B6 vs sst1B6 macrophages; however, this is not explored in the in vivo studies. Are there differences in TNFa levels in the pauci- vs multi-bacillary lesions that lead to (or correlate with) the accumulation of peroxidation products in the intralesional macrophages. How to the experiments with TNFa in vitro relate back to how the macrophages are responding in vivo during infection?

      Our investigation of mechanisms of necrosis of TB granulomas stems from and supported by in vivo studies as summarized below.

      This work started with the characterization necrotic TB granulomas in C3HeB/FeJ mice in vivo followed by a classical forward genetic analysis of susceptibility to virulent Mtb in vivo.

      That led to the discovery of the sst1 locus and demonstration that it plays a dominant role in the formation of necrotic TB granulomas in mouse lungs in vivo. Using genetic and immunological approaches we demonstrated that the sst1 susceptibility allele controls macrophage function in vivo (Yan, et al., J.Immunol. 2007) and an aberrant macrophage activation by TNF and increased production of Ifn-b in vitro (He et al. Plos Pathogens, 2013). In collaboration with the Vance lab we demonstrated that the type I IFN receptor inactivation reduced the susceptibility to intracellular bacteria of the sst1-susceptible mice in vivo (Ji et al., Nature Microbiology, 2019). Next, we demonstrated that the Ifnb1 mRNA superinduction results from combined effects of TNF and JNK leading to integrated stress response in vitro (Bhattacharya, JCI, 2021). Thus, our previous work started with extensive characterization of the in vivo phenotype that led to the identification of the underlying macrophage deficiency that allowed for the detailed characterization of the macrophage phenotype in vitro presented in this manuscript. In a separate study, the Sher lab confirmed our conclusions and their in vivo relevance using Bach1 knockout in the sst1-susceptible B6.Sst1S background, where boosting antioxidant defense by Bach1 inactivation resulted in decreased type I interferon pathway activity and reduced granuloma necrosis. We have chosen TNF stimulation for our in vitro studies because this cytokine is most relevant for the formation and maintenance of the integrity of TB granulomas in vivo as shown in mice, non-human primates and humans. Here we demonstrate that although TNF is necessary for host resistance to virulent Mtb, its activity is insufficient for full protection of the susceptible hosts, because of altered macrophages responsiveness to TNF. Thus, our exploration of the necrosis of TB granulomas encompass both in vitro and extensive in vivo studies.

      Minor comments

      Introduction, while well written, is longer than necessary. Consider shortening this section. Throughout figures, many graphs show a fold induction/accumulation/etc, but it is rarely specified what the internal control is for each graph. This needs to be added.

      Paragraph one, authors use the phrase "the entire IFN pathway was dramatically upregulated..." seems to be an exaggeration. How do you know the "entire" IFN pathway was upregulated in a dramatic fashion?

      (1) We shortened the introduction and discussion; (2) verified that figure legends internal controls that were used to calculate fold induction; (3) removed the word “entire” to avoid overinterpretation.

      Figures 1E, G and H and supp fig 1C, the heat maps are missing an expression key Section 2 second paragraph refers to figs 2D, E as cytoplasmic in the text, but figure legend and y-axis of 2E show total protein.

      The expression keys were added to Fig.1E,G,H, Fig.7C, Suppl.Fig.1C and 1D and Suppl.Fig.11A of the revised manuscript.

      Section 3 end of paragraph 1 refers to Fig 3h. Does this also refer to Supp Fig 3E?

      Yes, Fig.3H shows microscopy of 4-HNE and Suppl.Fig.3H shows quantification of the image analysis. In the revised manuscript these data are presented in Fig.3H and Suppl.Fig.3F. The text was modified to reflect this change.

      Supplemental Fig 3 legend for C-E seems to incorrectly also reference F and G.

      We corrected this error in the figure legend. New panels were added to Suppl.Fig.3 and previous Suppl.Fig.3F and G were moved to Suppl.Fig.4 panels C and D of the revise version.

      Fig 3K, the p-cJun was inhibited with the JNK inhibitor, however it’s unclear why this was done or the conclusion drawn from this experiment. Use of the JNK inhibitor is not discussed in the text.

      The JNK inhibitor was used to confirm that c-Jun phosphorylation in our studies is mediated by JNK and to compare effects of JNK inhibition on phospho-cJun and Myc expression. This experiment demonstrated that the JNK inhibitor effectively inhibited c-Jun phosphorylation but not Myc upregulation, as shown in Fig.5I-J of the revised manuscript.

      Fig 4 I and Supp Fig 3 H seem to have been swapped? The graph in Fig 4I matches the images in Supp Fig 3I. Please check.

      We reorganized the panels to provide microscopy images and corresponding quantification together in the revised the panels Fig. 4H and Fig. 4I, as well as in Suppl. Fig. 4F and Suppl. Fig. 4G.

      Fig 6, it is unclear what % cell number means. Also for bacterial growth, the data are fold change compared to what internal control?

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. This allows us to exclude artifacts due to cell loss. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      Fig 7B needs an expression key

      The expression keys was added to Fig.7C (previously Fig. 7B).

      Supp Fig 7 and Supp Fig 8A, what do the arrows indicate?

      In Suppl.Fig.8 (previously Suppl.Fig.7) the arrows indicate acid fast bacilli (Mtb). In figures Fig.7A and Suppl.Fig.9A arrows indicate Mtb expressing fluorescent reporter mCherry. Corresponding figure legends were updated in the revised version.

      Supp Fig 9A, two ROI appear to be outlined in white, not just 1 as the legend says Methods:

      We updated the figure legend.

      Certain items are listed in the Reagents section that are not used in the manuscript, such as necrostatin-1 or Z-VAD-FMK. Please carefully check the methods to ensure extra items or missing items does not occur.

      These experiments were performed, but not included in the final manuscript. Hence, we removed the “necrostatin-1 or Z-VAD-FMK” from the reagents section in methods of revised version.

      Western blot, method of visualizing/imaging bands is not provided, method of quantifying density is not provided, though this was done for fig 5C and should be performed for the other WBs.

      We used GE ImageQuant LAS4000 Multi-Mode Imager to acquire the Western blot images and the densitometric analyses were performed by area quantification using ImageJ. We included this information in the method section. We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Reviewer #3 (Significance):

      The work of Yabaji et al is of high significance to the field of macrophage biology and M.tb pathogenesis in macrophages. This work builds from previously published work (Bhattacharya 2021) in which the authors first identified the aberrant response induced by TNF in sst1 mutant macrophages. Better understanding how macrophages with the sst1 locus respond not only to bacterial infection but stimulation with relevant ligands such as TNF will aid the field in identifying biomarkers for TB, biomarkers that can suggest a poor outcome vs. "cure" in response to antibiotic treatment or design of host-directed therapies.

      This work will be of interest to those who study macrophage biology and who study M.tb pathogenesis and tuberculosis in particular. This study expands the knowledge already gained on the sst1 locus to further determine how early macrophage responses are shaped that can ultimately determine disease progression.

      Strengths of the study include the methodologies, employing both bulk and single cell-RNA seq to answer specific questions. Data are analyze using automated methods (such as HALO) to eliminated bias. The experiments are well planned and designed to determine the mechanisms behind the increased iron-related oxidative stress found in the mutant macrophages following TNF treatment. Also, in vivo studies were performed to validate some of the in vitro work. Examining pauci-bacillary lesions vs multi-bacillary lesions and spatial transcriptomics is a significant strength of this work. The inclusion of human data is another strength of the study, showing increased Myc in humans with poor response to antibiotics for TB.

      Limitations include the fact that the work is all done with BMDMs. Use of alveolar macrophages from the mice would be a more relevant cell type for M.tb studies. AMs are less inflammatory, therefore treatment with TNF of AMs could result in different results compared to BMDMs. Reviewer's field of expertise: macrophage activation, M.tb pathogenesis in human and mouse models, cell signaling.

      Limitations: not qualified to evaluate single cell or bulk RNA-seq technical analysis/methodology or spatial transcriptomics analysis.

    1. Author response:

      The following is the authors’ response to the current reviews

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this. The levels of of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours. 

      These appear to be the same comments the reviewer presented last time, so we will reiterate our prior points here and elsewhere. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate (as measured by gDNA levels), and this conclusion cannot be drawn from our data. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. The reviewer has applied their own subjective (and erroneous) interpretation to the model. The asynchronicity of the normal developmental cycle means RBs continue to replicate as EBs are forming, so gDNA levels cannot be used as the sole metric for determining RB levels. We show that reduced c-di-AMP levels reduce EB levels as well as transcripts associated with late stages of development. The parsimonious interpretation of these data support that low c-di-AMP levels delay progression through the developmental cycle consistent with our model.

      The data still do not support the overall model.

      We disagree.  We have presented quantified data that include appropriate controls and statistical tests, and the reviewer has not disputed that or pointed to additional experiments that need to be performed.  The reviewer has imposed a subjective interpretation of our model based on their own biases.  A reader is free, of course, to disagree with our model, but a reviewer should not block a manuscript based on such a disagreement if no experimental flaws have been identified. 

      In Figure 1 the authors show at 24 hpi. 

      We also showed data from 16hpi, which is a more relevant timepoint for assessing premature transition to EBs.  In contrast, the 24hpi is more important for assessing developmental effects of reduced c-di-AMP levels.

      DacA overexpression increases cdiAMP to ~4000 pg/ml 

      DacAmut overexpression reduces cdiAMP dramatically to ~256 pg/ml) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml. 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml . 

      dacAKD decreases cdiAMP to ~300 pg/ml . 

      dacAKDcom increased cdiAMP to ~8000 pg/ml. 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml. 

      DacA-ybbRopmut ~300 pg/ml. 

      However in Figure 2 the data show that overexpression of DacA (cdiAMP ~4000 pg/ml) did not have a different phenotype than over expression of the mutant (cdiAMP ~256 pg/ml). HctA expression down, omcB expression down, euo not much change, replication down, and IFUs down. Additionally, Figure 3 shows no differences in anything measured although cdiAMP levels were again dramatically different. DacATM overexpression (~4000 pg/ml) and DacAmutTM (~1500). This makes it unclear what cdiAMP is doing to the developmental cycle. 

      As we have explained in the text and in response to reviewer comments on previous rounds of review, overexpressing the full-length WT or mutant DacA is detrimental to developmental cycle progression for reasons that have nothing to do with c-di-AMP levels (likely disrupting membrane function), since, as the reviewer notes, the WT DacA deltaTM strain had similar c-di-AMP levels but no negative effects on growth/development. If we had not presented the effects of overexpressing the individual isoforms, then a reviewer would surely have requested such, which is why we present these data even though they don’t seem to support our model.  This is an honest representation of our findings.  The reviewer seems intent on nitpicking a minor datapoint that seems to contradict the rest of the manuscript while ignoring or not carefully reading the rest of the manuscript.

      In Figure 4 the authors knockdown dacA (dacA-KD) and complement the knockdown (dacA-KDcom) 

      dacAKD decreases cdiAMP (~300) while DacA-KDcom increases cdiAMP much above wt (~8000). 

      KD decreased hctA and omcB at 24hpi. Complementation resulted in a moderate increase in hctA at a single time point but not at 24 hpi and had no effect on euo or omcB expression.

      By 24hpi, late gene transcripts are being maximally produced during a normal developmental cycle. It is unclear why the reviewer thinks that these transcripts should be elevated above this level in any of our strains that prematurely transition to EBs. There is no basis in the literature to support such an assumption. As we noted in the text, the dacA-KDcom strain phenocopied the dacAop OE strain, and we showed RNAseq data and EB production curves for the latter that support our conclusions of the effect of increased c-di-AMP levels on developmental progression.

      Importantly, complementation decreased the growth rate.

      Yes, since the c-di-AMP levels breached the “EB threshold” at 16hpi, it causes premature transition to EBs, which do not replicate their gDNA, at an earlier stage of the cycle when fewer organisms are present. Therefore, the gDNA levels are decreased at 24hpi, which is consistent with our model.

      Based on the proposed model, growth rate should increase as the chlamydia should all be RBs and replicating and not exiting the cell cycle to become EBs (not replicating).

      This is a spurious conclusion from the reviewer. As we clearly showed, the dacA-KDcom did not restore a wild-type phenotype and instead mimicked the dacAop OE strain. This was commented on in the text.

      Interestingly reducing cdiAMP levels by over expressing DacAmut (~256 pg/ml) did not have an effect on the cycle but the reduction in cdiAMP by knockdown of dacA (~300 pg/ml) did have a moderate effect on the cycle. 

      This is again a spurious conclusion from the reviewer. The dacAMut and dacA-KD strains are distinct. As noted in the text and above for DacA WT OE, overexpressing the DacAMut similarly disrupts organism morphology, which is different from dacA-KD. These strains should not be directly compared because of this. This point has been previously highlighted in the text (in Results and Discussion).

      For Figure 5 DacA-ybbRop was overexpressed and this increased cdiAMP dramatically ~500,000 pg/ml as compared to wt ~1500. This increased hctA only at an early timepoint and not at 24hpi and again had no effect on omcB or euo.

      As we explained in prior reviews, our RNAseq data more comprehensively assessed transcripts for the dacAop OE strain. These data show convincingly that late gene transcripts (not just hctA and omcB) are elevated earlier in the developmental cycle. Again, it is not clear why the reviewer should expect that late gene transcripts should be higher in these strains than they are during a normal developmental cycle. This is not part of our model and appears to be a bias that the reviewer has imposed that is not supported by the literature.

      Overexpression of the operon with the mutation DacA-ybbRopmut reduced cdiAMP to ~300 pg/ml and this showed a reduction in growth rate similar to dacAmut but a more dramatic decrease in IFUs. 

      As we described in the text, in earlier revisions, and above, the dacAMut OE strain has distinct effects unrelated to c-di-AMP levels and, therefore, should not be compared to other strains in terms of linking its c-di-AMP levels to its phenotype.

      Overall: 

      DacA overexpression increases cdiAMP to ~4000 pg/ml (decreased everything except euo) 

      DacAmut overexpression reduces cdiAMP dramatically (~256 pg/ml). (decreased everything except euo) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml (no changes noted) 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml (no changes noted) 

      dacAKD decrease cdiAMP to ~300 pg/ml (decreased everything except euo) 

      dacAKDcom increased cdiAMP to ~8000 pg/ml (decreases growth rate, increase hctA a little but not omcB) 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml (decreases growth rate, increase hctA a little but not omcB) <br /> DacA-ybbRopmut ~300 pg/ml (decreased everything except euo) 

      Overall, the data show that increasing cdiAMP only has a phenotype if it is dramatically increased, no effect at 4000 pg/ml.

      Yes, this clearly shows there is a threshold - as we hypothesize!  However, these thresholds are more important at the 16hpi timepoint not 24hpi (which the reviewer is referencing) when assessing premature transition to EBs.  We specifically highlighted in our prior revision in Figure 1E this EB threshold to make this point clearer for the reader.  Once the threshold is breached, then the overall c-di-AMP levels become irrelevant as the RBs have begun their transition to EBs.

      Decreasing cdiAMP has a consistent effect, decreased growth rate, IFU, hctA expression and omcB expression. However, if their proposed model was correct and low levels of cdiAMP blocked EB conversion then more chlamydial cells would be RBs (dividing cells) and the growth rate should increase.

      The only effect should be normal gDNA levels, which is what we see in the dacA-KD.  Given the asynchronicity of a normal developmental cycle in which RBs continue to replicate as EBs are still forming, there is no basis to assume gDNA levels should increase under these conditions for the dacA-KD strain at 24hpi.

      Conversely, if cdiAMP levels were dramatically raised then all RBs would all convert and the growth rate would be very low.

      We agree. This is what is reflected by the dacAop OE and dacA-KDcom strains, with reduced gDNA levels at 24hpi since organisms have transitioned to EBs at an earlier time post-infection.

      When cdiAMP was raised to ~4000 pg/ml there was no effect on the growth rate.

      Yes, because it had not breached the EB threshold at 16hpi – consistent with our model!  The reviewer is confusing effects of elevated c-di-AMP at 24hpi when they should be assessed at the 16hpi timepoint for strains overproducing this molecule.

      However, an increase to ~8000 pg/ml resulted in a significant decrease but growth continued.

      If the reviewer is referring to the dacA-KDcom strain, then this is not accurate. gDNA levels were decreased in this strain at 24hpi when the c-di-AMP levels were increased compared to the WT (mCherry OE) control at 16hpi, indicating this strain had breached the “EB threshold” and initiated conversion to EBs at an earlier timepoint post-infection when fewer organisms were present.

      Increasing cdAMP to ~500,000 pg/ml had less of an impact on the growth rate.

      It is not clear what this conclusion is based on and what the reviewer is comparing to.  This is a subjective assessment not based on our data.

      Overall, the data does not cleanly support the proposed model.

      It is an unfortunate aspect of biology, particularly for obligate intracellular bacteria – a challenging experimental system on which to work, that the data are not always “clean”.  The overall effects of increased c-di-AMP levels on chlamydial developmental cycle progression we have documented support our model, and we think the reader, as always, should make their own assessment.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however, the data is very preliminary, and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this.

      Thank you for the comments. We have apparently not adequately communicated our predictions and the model. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate, and there is no basis in any of our data to support that. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. We have clarified this in the text (line 89 paragraph).

      The levels of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours.

      Our hypothesis is that increasing concentrations of c-di-AMP within a given RB is a signal for it to undergo secondary differentiation to the EB, and the data support this as noted by the reviewers. Again, we stress that low levels of c-di-AMP are irrelevant to the model. We have revised Figure 1E to indicate the level of c-di-AMP in the control strain at the 24hpi timepoint that coincides with increased EB levels. We hope this will further clarify the goals of our study. That a given strain might be below the EB control is not relevant to the model beyond indicating that it has not reached the necessary threshold for triggering secondary differentiation.

      The authors responded to reviewers' critiques by adding the overexpression of DacA without the transmembrane region. This addition does not really help their case. They show that detaTM-DacA and detaTM-DacA (D164N) had the same effects on c-di-AMP levels but the figure shows no effects on the developmental cycle.

      As it relates directly to the reviewer’s point, the delta-TM strains did not show the same level of c-di-AMP. It may be that the reviewer misread the graph. The purpose of testing these strains was to show that the negative effects of overexpressing full-length WT DacA were due to its membrane localization. Both the FL and deltaTM-DacA (WT) overexpression had equivalent c-di-AMP levels even though the delta-TM overexpression looked like the mCherry-expressing strain based on the measured parameters. This shows that the c-di-AMP levels were irrelevant to the phenotypes observed when overexpressing these WT isoforms. For the mutant isoforms, the delta-TM looked like the mCherry-expressing control while the FL isoform was negatively impacted for reasons we described in the Discussion (e.g., dominant negative effect). In addition, at 16hpi, neither delta-TM strain had c-di-AMP levels that approached the 24h control as denoted in Figure 1E (dashed line) and in the text, which explains why these strains did not show increased late gene transcripts at an earlier timepoint like the dacAop and dacA-KDcom strains.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      We respectfully disagree with this assessment as noted above in response to the reviewer’s critique. All of our data are quantified and support the hypothesis as stated.

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      It is not clear what quantitative models the reviewer would prefer, but, ultimately, it is up to the reader to decide whether they agree or not with the model we present. The data are the data, and we have tried to present them as clearly as possible. We would emphasize that, with the number of strains we have analyzed, we have presented a huge amount of data for a study with an obligate intracellular bacterium. As a comparison, most publications on Chlamydia might use a handful of transformant strains, if any. Given the cost and time associated with performing such studies, it is prohibitive to attempt all the time points that one might like to do, and it is not clear to us that further studies will add to or alter the conclusions of the current manuscript.

      Reviewer #2 (Recommendations for the authors): 

      Minor critiques 

      The graphs have red and blue lines but the figure legends are red and black. It would be better if these matched. 

      Changed.

      For Figure 1C. The labels are not very helpful. It's not clear what is HeLa vs mCherry. I believe it is uninfected vs Chlamydia infected.

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutive active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics.

      Strengths:

      Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism, and a characterization of the molecular details involved in control of cell proliferation, is interesting and impactful.

      Weaknesses:

      The functional cell division data need to be stronger. They show that increasing K+ increases proliferation and argue that since a MEK inhibitor (U0126) reduces proliferation in K+ treated cells, K+ induces cell division via ERK. But I don't see statistics to show that the rescue is significant, and I don't see a key U0126-only control. If the U0126 alone reduces proliferation, the combined effect wouldn't prove much.

      We thank the reviewer for constructive feedback. We repeated the experiment including the U0126-only control (5K+U). We updated Fig.1, presenting the newly obtained data with statistical analysis.

      Also, unless I'm missing something, it looks like every sample in their control has exactly the same number of mitotic cells. I understand that they are normalizing to this column, but shouldn't they be normalizing to the mean, with the independent values scattering around 1? It doesn't seem like it can be paired replicates since there are 6 replicates in the control and 4 replicates in one of the conditions? 

      We apologize for the unclear description. As the reviewer pointed out, the experiments were not paired replicates due to the limited number of conditions that can be conducted as a single experiment. To overcome this problem, we always included a control condition (i.e. 5K) based on which normalization was performed. This is the reason the data in 5K is always 1 and the sample size of 5K is the largest. Data include 100-900 mitotic cells within the imaging frame of 6 hrs. We re-wrote the figure legend (Fig1) and the main text, which hopefully clarified our experimental framework.

      Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      However, there are still some concerns as detailed in specific comments below:

      Specific comments:

      (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:

      (i) Does hypotonic shock activate ERK in U2OS cells?

      (ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?

      (iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?

      (iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.

      This is an important point. We conducted several experiments and provided explanations to rule out the possibility that ERK activation can be explained solely by cell volume change. We measured the osmolarity of all solutions used in this paper, which were 296-305 mOsm/L. This information was added to the Material and Methods section (line 387). Under our experimental conditions, ERK activation was not observed with hypotonic 70 % nor 50% osmolarity solution (Fig.S2).

      It is therefore unlikely that the main cause of ERK activation upon high K<sup>+</sup> perfusion is due to cell volume change. We would like to pursue this issue further when we obtain capacity to measure accurate cell volume change in the future.

      (2) Some more details about the experimental design and the results are needed from Figure 1:

      (i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?

      Only the high K<sup>+</sup> incubation phase was serum free. We added the following sentence in the main text (line 63) and an experimental diagram was added as Fig1A. “Cells were incubated in the presence of serum except for the phase with altered K<sup>+</sup> concentration. “

      (ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context.

      This is a very important point. However, the significance of membrane depolarization for cell proliferation in vivo is beyond the scope of this study. This important question will be addressed in the future.

      Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)?

      Cells were cultured in the presence of serum prior to the high K<sup>+</sup> incubation phase as described above. We added a new figure (Fig1A).

      What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?

      We included data recorded in serum-supplemented conditions (Fig.1), which showed a high mitotic rate. This is presumably due to the growth factors included in serum. There is no significant difference between 5K+FBS and 15K+FBS.

      (3) In Figure 2, there are some possible concerns with the perfusion experiment:

      (i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?

      The buffer was static prior to high K perfusion. We confirmed that perfusion alone does not activate ERK (Fig.S2). We added the following sentence to the main text. “We also confirmed that the effect of perfusion was negligible, as ERK activation was not observed upon start of the 5K<sup>+</sup> perfusion” (line 150).

      (ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photo bleaching to be significant.

      Although we don’ t have a clear answer to this question, we speculate that several aspects of the experimental setup may have contributed to the difference. The cell lines and imaging systems used in Fig.2 and Fig.3 were different. The expression level may be different between U2OS cells and HEK 293 cells: transient expression in U2OS cells in contrast to stable expression in HEK 293 cells. This difference may lead to the different signal-to-noise ratio. The imaging system used in Fig.2 is an epi-illumination microscope excited with a 439/24 bandpass filter and detected with 483/32 (CFP) and 542/27 (YFP), while the imaging system used in Fig.3 is a confocal microscope excited with 458 nm laser and detected with 475-525 (DFP) and LP530 (YFP). These optical setups may also contribute to the different population-average properties before stimulation.

      (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:

      (i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration: https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?

      We do not know whether cell volume is altered in the perforated-patch configuration. As discussed above, however, the effect of cell volume changes on ERK activity seemed to be negligible, because ERK activation was not observed with hypotonic 70 % nor 50% osmolarity solution (Fig.S2)

      (ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?

      We set the potential to -80mV immediately after the giga-seal formation and waited for at least 5 minutes to allow pore formation by gramicidin. We started imaging only after membrane potential was expected to have reached a steady state at -80 mV. We now included this sentence in the ‘Material and Methods’ section (line 398).

      (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:

      (i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.

      We measured membrane potential in the perforated patch configuration and confirmed that there is negligible potential drift within 20 minutes of perfusion with 145 K+ (only 1~5 mV change during perfusion).

      (ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).

      YFP/CFP ratio data in HEK cells are shown in Fig.S1. As the signal-to-noise level is affected by the expression level of the probe, it is difficult to compare between cells with different expression levels. A higher YFP/CFP value with HEK cells compared to HeLa cells and A431 cells (Sup1) does not necessarily mean that HEK cells have higher ERK activity.

      (iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.

      The experimental system using fluorescent biosensor cannot measure absolute ERK activity and can only measure the amount of change after a specific stimulus compared to the period before the stimulus. In electrophysiology experiments, the pre-stimulation membrane potential was clamped to -80 mV, whereas in the perfusion experiment, the membrane potential was variable in individual cells (-35 to -15 mV). It is therefore difficult to compare the results of electrophysiology experiments with those of the perfusion system. Unlike ion channels, it is currently not possible to plot absolute ERK activity with respect to the overall membrane potential. In the present study, we therefore discussed the change rather than the absolute value of ERK activity.

      (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV

      Thank you for pointing out that our description was confusing. We changed the sentence to clarify the point we wanted to make. It now reads as follows. “ERK activity showed signs of reduction within 1 minute after repolarization to -80 mV.” (line 174)

      Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses

      A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase confidence in the results.

      We apologize that that description was not clear. Due to the limited number of conditions that can be conducted as a single experiment, we always included control condition (i.e. 5K) and performed normalization by comparing with the control condition of the initial 1.5 hrs. Data were from 100-900 mitotic cell counts within 6hr of the imaging time window. We re-wrote the figure legend (Fig1) and the main text.

      The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.

      The present study focused on the link between membrane potential and the ERK activity; the mechanistic link between ERK activity and cell proliferation is beyond the scope of the present study. This important topic will be pursued further in subsequent studies.

      The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results for the most part support the conclusions.

      This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.

      Reviewer #1 (Recommendations for the authors):

      minor typo:

      ERK activity has voltage-dependency with the physiological rang of membrane potential should be "range"

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Small points:

      Line 82: rang -> range

      Corrected

      Line 102: ". they were stimulated" -> ". The cells were stimulated"

      Corrected

      Figs. 2C, 2D show exactly the same data points and the same information. Please cut one of these figures.

      We deleted 2C and added the information in 2D and made new Fig.2C.

      For all figs: Please indicate # of cells and # of independent dishes used in each experiment, and make clear whether individual data-points correspond to cells, dishes, or some other unit of measure.

      We added the information in figure legends.

      Reviewer #3 (Recommendations for the authors):

      The authors should repeat the cell proliferation experiments with more cells to strengthen the data. They could also use alternative assays like phosphorylated histone H3 staining for cells in M phase, that might to easier to quantitate.

      We repeated the experiment and Fig.1 was replaced with the new Fig.1

      The authors should investigate how the upregulation of ERK is driving cells into mitosis. At what point in the cell cycle is activated ERK induced by membrane depolarization having the effect. Is it entry into mitosis or earlier in the cell cycle?

      The cells were incubated with a high K+ solution 8-9 hr after G1 release, which is supposed to correspond to G2. These data suggest that mitotic activity is stimulated when ERK is activated at G2. However, we lack conclusive data at present to show the consequence of ERK activation during G2. We therefore cannot pinpoint the stage of cell cycle where depolarization-activated ERK exerts its effect.

      The authors refer a lot to the work of Zhou et al 2015 throughout the paper. This is not necessary and is a bit distracting.

      We deleted several sentence from the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Labels should be added in the Figures and should be uniform across all Figures (some are distorted).

      We thank the Reviewer for pointing out this issue. As requested, labels have been edited to ensure they are legible and are consistent in font, size, and style.  

      Reviewer #2 (Public review):

      (1) As for Figure 2F, Setd2-SET activity on WT rNuc (H3) appears to be significantly lower compared to what is extensively reported in the literature. This is particularly puzzling given that Figure 2B suggests that using 3H-SAM, H3-nuc are much better substrates than K36me1, whereas in Figure 3F, rH3 is weaker than K36me1. It is recommended for the authors to perform additional experimental repeats and include a quantitative analysis to ensure the consistency and reliability of these findings.  

      We appreciate the Reviewer’s points. We respectfully suggest that these comments may reflect potential confusion around interpreting how different assays detect in vitro methylation, what data can and cannot be compared, and the nature of the different substrates used. 

      With respect to point 1 (Western signal significantly lower compared to extensive literature): To the best of our knowledge, it would be extremely challenging to make a quantitative argument comparing the strength of the Western signal in Figure 2F with results reported in the literature. Specifically, comparing our results with previous studies would require (1) all the studies to have used the exact same antibodies as antibody signal intensities vary depending on the specific activity and selectively of a particular antibody and even its lot number, (2) similar in vitro methylation reaction condition, (3) the same type of recombinant nucleosomes used, and so on. Further, given that these are Western blots, we do not understand how one could interpret an absolute activity level. In the figure, all we can conclude is that in in vitro methylation reactions, our recombinant SETD2 protein methylates rNucs to generate mono-, di-, and tri-methylation at K36 (using vetted antibodies (see Fig. 2e)). If there is a specific paper within the extensive literature that the Reviewer highlights, we could look more into the details of why the signals are different (our guess is that any difference would largely be due to the use of different antibodies). We add that it might be challenging to find a similar experiment performed in the literature; we are not aware of a similar experiment. 

      With respect to comparing Figure 2B and 2F: We do not understand how one can meaningfully compare incorporation of radiolabeled SAM to antibody-based detection on film using an antibody against specific methyl states. In particular, regarding the question regarding comparing rH3 vs H3K36me1 nucleosomes, we point out that in using recombinant nucleosomes installed with native modifications (e.g. H3K36me1), in which the entire population of the starting material is mono-methylated, then naturally the Western signal with an anti-H3K36me1 antibody will be strong. In Fig. 2b, the assay is incorporation of radiolabeled methyl, which is added to the preexiting mono-methylated substrate. In other words, the results are entirely consistent if one understands how the methylation reactions were performed, how methylation was detected, and the nature of the reagents.

      (2) The additional bands observed in Figure 4B, which appear to be H4, should be accompanied by quantification of the intensity of the H3 bands to better assess K36me3 activity. Additionally, the quantification presented in Figure 4C for SAH does not seem accurate as it potentially includes non-specific methylation activity, likely from H4. This needs to be addressed for clarity and accuracy. 

      We thank the reviewer for this comment. The additional bands observed in Figure 4B represent degradation products of histone H3, not H4 methylation. This is commonly seen in in vitro reactions using recombinant nucleosomes, where partial proteolysis of H3 can occur under the assay conditions.  

      (3) In Figure 4E, the differences between bound and unbound substrates are not sufficiently pronounced. Given the modest differences observed, authors might want to consider repeating the assay with sufficient replicates to ensure the results are statistically robust.

      In Figure 4E, we observe a clear difference between the bound and unbound substrate. To aid interpretation, we have clarified in the figure where the bound complex migrates on the gel, while the unbound nucleosomes migrate at the bottom of the gel. The differences are indeed subtle, which we highlight in the text.  

      (4) Regarding labeling, there are multiple issues that need correction: In the depiction of Epicypher's dNuc, it is crucial to clearly mark H2B as the upper band, rather than ambiguously labeling H2A/H2B together when two distinct bands are evident. In Figure 3B and D, the histones appear to be mislabeled, and the band corresponding to H4 has been cut off. It would be beneficial to refer to Figure 3E for correct labeling to maintain consistency and accuracy across figures. 

      Thank you for pointing this out. To avoid any confusion, we have delineated the H2B and H2A markers and indicate the band corresponding to H4.

      (5) There are issues with the image quality in some blots; for instance, Figure 2EF and Figure 2D exhibit excessive contrast and pixelation, respectively. These issues could potentially obscure or misrepresent the data, and thus, adjustments in image processing are recommended to provide clearer, more accurate representations. 

      Contrast adjustments were applied uniformly across each entire image and were not used to modify any specific region of the blot. We have corrected the issue of increased pixelation in Figure 2D. 

      (6) The authors are recommended to provide detailed descriptions of the materials used, including catalog numbers and specific products, to allow for reproducibility and verification of experimental conditions. 

      We have added the missing product specifications and catalog numbers to ensure clarity and reproducibility of the experiments.

      (7) The identification of Setd2 as a tumor suppressor in KrasG12C-driven LUAD is a significant finding. However, the discussion on how this discovery could inspire future therapeutic approaches needs to be more balanced. The current discussion (Page 10) around the potential use of inhibitors is somewhat confusing and could benefit from a clearer explanation of how Setd2's role could be targeted therapeutically. It would be beneficial for the authors to explore both current and potential future strategies in a more structured manner, perhaps by delineating between direct inhibitors, pathway modulators, and other therapeutic modalities. 

      SETD2 is a tumor suppressor in lung cancer (as we show here and many others have clearly established in the literature) and thus we would recommend avoiding a SETD2 inhibitor to treat solid tumors, as it could have a very much unwanted affect.  Our discussion addresses a different point regarding the relative importance of the enzymatic activity versus other, nonenzymatic functions of SETD2. We believe that a detailed exploration of the therapeutic potential of inhibiting SETD2 would be better suited in a review or a more therapy-focused manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In this manuscript, Wolfson and co-authors demonstrate a combination of an injury-specific enhancer and engineered AAV that enhances transgene expression in injured myocardium. The authors characterize spatiotemporal dynamics of TREE-directed AAV expression in the injured heart using a non-invasive longitudinal monitoring system. They show that transgene expression is drastically increased 3 days post-injury, driven by 2ankrd1a. They reported a liver-detargeted capsid, AAV cc.84, with decreased viral entry into the liver while maintaining TREE transgene specificity. They further identified the IR41 serotype with enhanced transgene expression in injured myocardium from AAV library screening. This is an interesting study that optimizes the potential application of TREE delivery for cardiac repair. However, several concerns were raised prior to publication:

      Major Concerns:

      (1) In Figure 1, the authors demonstrated that 2andkrd1aEN is not responsive to sham injury after AAV delivery, but Figure 3 shows a strong response to sham when AAV is delivered after injury. The authors do not provide an explanation for this observation.

      This discrepancy is due to the timing of AAV delivery. In Figure 1, AAV was delivered 60 days prior to IVIS imaging and cardiac injury, allowing time for the baseline level of AAV transgene expression to reach a plateau. From this baseline level, we were able to measure fold change in luminescence signal before and after cardiac injury. In Figure 3, AAV was delivered 4 days after cardiac injury. Luminescence in the heart was measured 3 days later (day 7), when the baseline of AAV transgene expression is still building. The data from Figure 1C-D inform us that the 2ankrd1aEN response to cardiac injury peaks within the first week and returns to baseline levels after 5-7 weeks. In Figure 3E, we show that 2ankrd2aEN provides a baseline level of expression that is present in sham hearts and reaches its plateau after 6 weeks. In contrast, I/R injured hearts show enhanced expression in the first 3-4 weeks, corresponding with the dynamics of 2ankrd1aEN’s response to injury observed in Figure 1C. We have now included a phrase in the revised manuscript on p. 7, paragraph 1 to clarify.

      (2) In Figure 4, a higher GFP signal is observed in all areas of the heart of the IR41-treated mouse compared to AAV9. The authors should compare GFP expression between AAV9 and IR41 in uninjured hearts and provide insights into enhanced cardiac tropism to confirm that IR41 is MI injury enriched, not Sham as well.

      We sought to address this question with the experiments presented in Figure 5. We treated sham mice with AAV9 and IR41 containing 2ankrd1aEN. Figure 5D showed IR41 delivered more vector genomes to the sham heart on average, though not with a p-value less than 0.05 compared with AAV9. In Supplemental Figure 5B, IR41 also provided higher luminescence at day 7 post-sham but was comparable at day 14 and day 21. These data suggest IR41 might increase heart tropism in healthy hearts, but IR41’s effect is most dramatic when delivered to injured hearts, where cardiac vector genomes are highest (Figure 5D). We have now included a sentence in the revised manuscript on p. 8, paragraph 2 to clarify.

      (3) The authors should clarify which model is being used between myocardial infarction (MI) and Ischemia-reperfusion (IR) throughout the figures, as the experimental schemes and figure legends did not match with each other (MI or IR in Figure 1A, 1D, 3A, and 3E). Both models cause different types of injuries. The authors should explain the difference in TREE expression in both models.

      We have revised the figures to specify the model, where I/R or MI is used.

      (4) In Figure 2, the authors use REN instead of 2ankrd1aEN to demonstrate liver-detargeting using AAV cc.84. Is there a specific reason?

      Our data in Figure 1 informed us that off-target liver expression is more specifically an issue for REN compared to 2ankrd1aEN. Baseline levels of luminescence in the heart could not be as clearly marked due to off-target expression in the liver, which was showcased in Figure 2B with AAV9 delivery to sham mice. As discussed above, 2ankrd1aEN provided stronger baseline levels of expression of the heart which could be more clearly marked in IVIS images for tracking fold changes over time. For these reasons, we sought to explore how incorporation of the AAV.cc84 capsid could be utilized to minimize off-target liver expression. We have now included a sentence in the revised manuscript on p. 5, paragraph 3 to clarify.

      Reviewer #2 (Public review):

      In this manuscript by Wolfson et al., various adeno-associated viruses (AAVs) were delivered to mice to assess the cardiac-specificity, injury border-zone cardiomyocyte transduction rate, and temporal dynamics, with the goal of finding better AAVs for gene therapies targeting the heart. The authors delivered tissue regeneration enhancer elements (TREEs) controlling luciferase expression and used IVIS imaging to examine transduction in the heart and other organs. They found that luciferase expression increased in the first week after injury when using AAV9-TREE-Hsp68 promoter, waning to baseline levels by 7 weeks. However, AAV9 vectors transduced the liver, which was significantly reduced by using an AAV.cc84 liver de-targeting capsid. The authors then performed in vivo screening of AAV9 capsids and found AAV-IR41 to preferentially transduce injured myocardium when compared to AAV9. Finally, the authors combined TREEs with AAV-IR41 to show improved luciferase expression compared to AAV9-TREE at 7, 14, and 21 days after injury.

      Overall, this manuscript provides insights into TREE expression dynamics when paired with various heart-targeting capsids, which can be useful for researchers studying ischemic injury of murine hearts. While the authors have shown the success of using AAV9-TREEs in porcine hearts, it is unknown whether the expression dynamics would be similar in pigs or humans, as mentioned in the limitations.

      The following questions and concerns can be addressed to improve the manuscript:

      (1) From the IVIS data, it seems that the Hsp68 promoter might not be "normally silent in mouse tissues," specifically in the liver (Figure S1B). Are there any other promoters that can be combined with TREEs to induce cardiac-injury specific expression while minimizing liver expression? This could simplify capsid design to focus on delivery to injured areas.

      Indeed we found the Hsp68 promoter does provide low levels of baseline expression, especially in the liver of mice. The Hsp68 promoter was initially chosen due to its permissive nature allowing for assessment of expression directed by TREEs. Many or most groups use the Hsp68 promoter for enhancer tests in mice, but we agree that other permissive promoters might have lower baseline levels of expression and might have the benefit of smaller size. We have not rigorously tested other permissive promoters in our experiments.

      (2) Why is it that AAV9-TREE-Hsp68-Luc wane in expression (Figure 1C and 1D), whereas AAV.cc84-TREE-Hsp68-Luc expresses stably for over 2 months (3E)? This has important implications for the goal of transience in gene delivery.

      Please see our response to reviewer 1’s comment #1 above.

      (3) AAV-IR41 was found to transduce cardiomyocytes in the injured zone. However, this capsid also shows a very strong off-target liver expression. From a capsid design perspective, is it possible to combine AAV-cc84 and AAV-IR41?

      This approach is in theory possible as these epitopes are structurally distinct. However, since the mechanism (receptor usage) is currently unknown, it would not be possible to predict whether the properties are mutually exclusive. Further, we would need to ensure that combining modifications does not impact vector yield. We can explore such features with next generation candidates as we continue to improve the platform. We have now included a sentence in the revised manuscript on p. 9, paragraph 3, mentioning the possibility of combining the two capsid mutations.

      (4) It would be helpful to see immunostaining for the various time points in Figure 5. Is it possible to use an anti-luciferase antibody (or AAV-TREE-Hsp68-eGFP) to compare the two TREE capsids?

      We were not able to do immunostaining of luciferase expression, because the biopsied hearts were used to quantify vector genomes via qPCR. We have previously reported results of immunostaining of EGFP expression directed by 2ankrd1aEN in I/R-injured mouse hearts (Yan et al., 2023), which we expect to match the expression seen in these experiments.

      Reviewer #3 (Public review):

      Summary:

      The tissue regeneration enhancer elements (TREEs) identified in zebrafish have been shown to drive injury-activated temporal-spatial gene expression in mice and large animals. These findings increase the translational potential of findings in zebrafish to mammals. In this manuscript, the authors tested TREEs in combination with different adeno-associated viral (AAV) vectors using in vivo luciferase bioluminescent imaging that allows for longitudinal tracking. The TREE-driven luciferase delivered by a liver de-targeted AAV.cc84 decreased off-target transduction in the liver. They further screened an AAV library to identify capsid variants that display enhanced transduction for myocardium post-myocardial infarction. A new capsid variant, AAV.IR41, was found to show increased transduction at the infarct border zones.

      Strengths:

      The authors injected AAV-cargo several days after ischemia/reperfusion (I/R) injury as a clinically relevant approach. Overall, this study is significant in that it identifies new AAV vectors for potential new gene therapies in the future. The manuscript is well-written, and their data are also of high quality.

      Weaknesses:

      The authors might be using MI (myocardial infarction) and I/R injury interchangeably in their text and labels. For instance, "We systemically transduced mice at 4 days after permanent left coronary artery ligation with either AAV9 or IR41 harboring a 2ankrd1aEN-Hsp68::fLuc transgene. IVIS imaging revealed higher expression levels in animals transduced with IR41 compared to AAV9, in both sham and I/R groups (Fig. 5A)". They should keep it consistent. There is also no description for the MI model.

      We have adjusted figure labels and main text to ensure the injury model is described correctly.

      We have also addressed all additional Recommendations for the authors, which requested minor modifications to figures like error bars and image annotation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study uses mesoscale simulations to investigate how membrane geometry regulates the multiphase organization of postsynaptic condensates. It reveals that dimensionality shifts the balance between specific and non-specific interactions, thereby reversing domain morphology observed in vitro versus in vivo.

      Strengths:

      The model is grounded in experimental binding affinities, reproduces key experimental observations in 3D and 2D contexts, and offers mechanistic insight into how geometry and molecular features drive phase behavior.

      Weaknesses:

      The model omits other synaptic components that may influence domain organization and does not extensively explore parameter sensitivity or broader physiological variability.

      We thank the reviewer for his/her time and effort to our manuscript. We agree with the point that the contribution of other synaptic components should be addressed. We have included a discussion of the effects of environmental factors such as protein and ion concentrations, as well as other omitted postsynaptic components (SAPAP, Shank, and Homer) on phase morphology. In the middle of the 2<sup>nd</sup> paragraph of Discussion, we added: 

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      Also, as the reviewer pointed out, we agree with that physiological factors such as ion concentration may influence the phase. However, conditions such as ion concentration are implicitly implemented as the specific and nonspecific interactions in this model, which makes it difficult to estimate the effect of each physiological condition individually. We added the variability potential of physiological conditions to the discussion section as a limitation of this model. To investigate parameter sensitivity in more detail, we performed additional MD simulations with weakened membrane constraints to account for the behavior between 3D and 2D. We added:

      “First, our results did not provide direct insights to physiological conditions, such as ion concentrations. Since such factors are implicitly implemented in our model, it is difficult to estimate these effects individually. This suggests the need for future implementation of environmental factors and validation under a broader range of in vivo-like settings.”

      Reviewer #2 (Public review):

      This is a timely and insightful study aiming to explore the general physical principles for the sub-compartmentalization--or lack thereof--in the phase separation processes underlying the assembly of postsynaptic densities (PSDs), especially the markedly different organizations in three-dimensional (3D) droplets on one hand and the twodimensional (2D) condensates associated with a cellular membrane on the other. Simulation of a highly simplified model (one bead per protein domain) is carefully executed. Based on a thorough consideration of various control cases, the main conclusion regarding the trade-off between repulsive excluded volume interactions and attractive interactions among protein domains in determining the structures of 3D vs 2D model PSD condensates is quite convincing. The results in this manuscript are novel; however, as it stands, there is substantial room for improvement in the presentation of the background and the findings of this work. In particular,

      (i) conceptual connections with prior works should be better discussed 

      (ii) essential details of the model should be clarified, and

      (iii) the generality and limitations of the authors' approach should be better delineated.

      We appreciate the reviewer for his/her time and effort on our manuscript and for encouraging comments and helpful suggestions. We answered every technical comment the reviewer mentioned below.

      Specifically, the following items should be addressed (with the additional references mentioned below cited and discussed):

      (1) Excluded volume effects are referred to throughout the text by various terms and descriptions such as "repulsive force according to the volume" (e.g., in the Introduction), "nonspecific volume interaction", and "volume effects" in this manuscript. This is somewhat curious and not conducive to clarity, because these terms have alternate or connotations of alternate meanings (e.g., in biomolecular modeling, repulsive interactions usually refer to those with longer spatial ranges, such as that between like charges). It will be much clearer if the authors simply refer to excluded volume interactions as excluded volume interactions (or effects).  

      Thank you for this comment. We have substituted the words “excluded volume interactions” for words of similar meaning. However, we have left the expression of “non-specific interactions” as they are referring to explicit interactions that are given as force fields in the model, rather than in the general meaning of excluded volume effect.

      (2) In as much as the impact of excluded volume effects on subcompartmentalization of condensates ("multiple phases" in the authors' terminology), it has been demonstrated by both coarse-grained molecular dynamics and field-theoretic simulations that excluded volume is conducive to demixing of molecular species in condensates [Pal et al., Phys Rev E 103:042406 (2021); see especially Figures 4-5 of this reference]. This prior work bears directly on the authors' observation. Its relationship with the present work should be discussed.  

      We appreciate the reviewer’s insightful comment. We have now included a more detailed discussion on excluded volume effect in the revised manuscript, which provides important context for our findings. Furthermore, we have cited the references to support and enrich the discussion, as recommended.

      (3)  In the present model setup, activation of the CaMKII kinase affects only its binding to GluN2Bc. This approach is reasonable and leads to model predictions that are essentially consistent with the experiment. More broadly, however, do the authors expect activation of the CaMKII kinase to lead to phosphorylation of some of the molecular species involved with PSDs? This may be of interest since biomolecular condensates are known to be modulated by phosphorylation [Kim et al., Science 365:825-829 (2019); Lin et al, eLife 13:RP100284 (2025)].  

      We agree that phosphorylation effect on phase separation is an important and interesting aspect to consider. Some experimental results have shown that activation of CaMKII can lead to phosphorylation of various proteins and make PSD condensate more stable by altering their interactions. We included the sentence below in limitations:

      “In this context, we also do not explicitly account for downstream phosphorylation events. Although such proteins are not included in the current components, they will regulate PSD-95, affecting its binding valency, or diffusion coefficient. This is a subject worthy of future research.”

      (4) The forcefield for confinement of AMPAR/TARP and NMDAR/GluN2Bc to 2D should be specified in the main text. Have the authors explored the sensitivity of their 2D findings on the strength of this confinement?

      We thank the reviewer for the helpful recommendation. We have revised the manuscript to include membrane-mimicking potential on main text. Furthermore, we also think that exploring the shape of the 3D/2D condensate phase due to the sensitivity of confinement is a very interesting point. We have additionally performed MD simulations with smaller/larger membrane constraints and included the results in supporting information as Figure S5. The following parts are added:

      “We further attempted to mimic intermediate conditions between 3D and 2D systems in two different manners. First, we applied a weaker membrane constraint in 2D system. Even when the strength of membrane constraints is reduced by a factor of 1000, NMDARs are located on the inner side when the CaMKII was active, as well as the result in 2D system (Fig.S5ABC). Second, to weaken further the effect of membrane constraints, we artificially altered the membrane thickness from 5 nm to 50 nm, in addition to reducing the membrane constraints by 1000. As a result, NMDAR clusters move to the bottom and surround AMPAR (Fig.S5DEF). In this artificial intermediate condition, both states in which the NMDARs are outside (corresponding to 3D) and in which the NMDARs are inside (corresponding to 2D) are observed, depending on the strength of the membrane constraint.”

      (5)  Some of the labels in Figure 1 are confusing. In Figure 1A, the structure labeled as AMPAR has the same shape as the structure labeled as TARP in Figure 1B, but TARP is labeled as one of the smaller structures (like small legs) in the lower part of AMPAR in Figure 1A. Does the TARP in Figure 1B correspond to the small structures in the lower part of AMPAR? If so, this should be specified (and better indicated graphically), and in that case, it would be better not to use the same structural drawing for the overall structure and a substructure. The same issue is seen for NMDAR in Figure 1A and GluN2Bc in Figure 1B. 

      (6) In addition to clarifying Figure 1, the authors should clarify the usage of AMPAR vs TARP and NMDAR vs GluN2Bc in other parts of the text as well.

      (7) The physics of the authors' model will be much clearer if they provide an easily accessible graphical description of the relative interaction strengths between different domain-representing spheres (beads) in their model. For this purpose, a representation similar to that given by Feric et al., Cell 165:1686-1697 (2016) (especially Figure 6B in this reference) of the pairwise interactions among the beads in the authors' model should be provided as an additional main-text figure. Different interaction schemes corresponding to inactive and activated CAMKII should be given. In this way, the general principles (beyond the PSD system) governing 3D vs 2D multiple-component condensate organization can be made much more apparent.  \

      We sincerely appreciate the reviewer’s comments. According to the recommendation, we have changed the diagram in Figure 1B into interaction matrix with each mesoscale molecular representation and the expression in main text to be clearer about AMPAR and TARP, and about the relationship between NMDAR and GluN2Bc. Former diagram of the pairs of specific interaction is moved to supplementary figure. 

      (8) Can the authors' rationalization of the observed difference between 3D and 2D model PSD condensates be captured by an intuitive appreciation of the restriction on favorable interactions by steric hindrance and the reduction in interaction cooperativity in 2D vs 3D?  

      We thank the reviewer for the comment. As pointed out, the multiphase morphology change observed in this study can be attributed to a decrease in coordination number in 2D compared to 3D. We have included the physicochemical rationalization in the discussion.  

      (9) In the authors' model, the propensity to form 2D condensates is quite weak. Is this prediction consistent with the experiment? Real PSDs do form 2D condensates around synapses.  

      We are grateful to the reviewer for highlighting this important point. We agree with that the real PSD forms 3D condensates beneath the 2D membrane. Some lower PSD components under the membrane (i.e. SAPAP, Shank, and Homer) are omitted in our system, which may cause a weak condensation. To emphasize this, we have added the following sentence:

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      However, we believe that the clusters formed on the 2D membrane are not a robust “phase” because they do not follow scaling law. In fact, in our previous study of PSD system with AMPAR(TARP)<sub>4</sub> and PSD-95, we have already reported that phase separation is less likely to occur in 2D than in 3D. The previous result suggests that phase separation on membrane may be difficult to achieve, which is consistent with the results of this study.

      (10) More theoretical context should be provided in the Introduction and/or Discussion by drawing connections to pertinent prior works on physical determinants of co-mixing and de-mixing in multiple-component condensates (e.g., amino acid sequence), such as Lin et al., New J Phys 19:115003 (2017) and Lin et al., Biochemistry 57:2499-2508 (2018). 

      (11) In the discussion of the physiological/neurological significance of PSD in the Introduction and/or Discussion, for general interest it is useful to point to a recently studied possible connection between the hydrostatic pressure-induced dissolution of model PSD and high-pressure neurological syndrome [Lin et al., Chem Eur J 26:11024-11031 (2020)].

      We thank the reviewer for the helpful recommendation. We have added the recommended references in each relevant part in introduction, respectively.

      (12) It is more accurate to use "perpendicular to the membrane" rather than "vertical" in the caption for Figure 3E and other such descriptions of the orientation of the CaMKII hexagonal plane in the text.

      We thank you for your comment. We replaced the word “vertical” with “perpendicular" in the main text and caption.

      Reviewer #3 (Public review):

      Summary:

      In this work, Yamada, Brandani, and Takada have developed a mesoscopic model of the interacting proteins in the postsynaptic density. They have performed simulations, based on this model and using the software ReaDDy, to study the phase separation in this system in 2D (on the membrane) and 3D (in the bulk). They have carefully investigated the reasons behind different morphologies observed in each case, and have looked at differences in valency, specific/non-specific interactions, and interfacial tension.

      Strengths:

      The simulation model is developed very carefully, with strong reliance on binding valency and geometry, experimentally measured affinities, and physical considerations like the hydrodynamic radii. The presented analyses are also thorough, and great effort has been put into investigating different scenarios that might explain the observed effects.

      Weaknesses:

      The biggest weakness of the study, in my opinion, has to do with a lack of more in-depth physical insight about phase separation. For example, the authors express surprise about similar interactions between components resulting in different phase separation in 2D and 3D. This is not surprising at all, as in 3D, higher coordination numbers and more available volume translate to lower free energy, which easily explains phase separation. The role of entropy is also significantly missing from the analyses. When interaction strengths are small, entropic effects play major roles. In the introduction, the authors present an oversimplified view of associative and segregative phase transitions based on the attractive and repulsive interactions, and I'm afraid that this view, in which all the observed morphologies should have clear pairwise enthalpic explanations, diffuses throughout the analysis. Meanwhile, I believe the authors correctly identify some relevant effects, where they consider specific/nonspecific interactions, or when they investigate the reduced valency of CaMKII in the 2D system.

      We thank the reviewer for the insightful and constructive comments. Regarding the difference in phase behavior between 2D and 3D systems, we appreciate the reviewer’s clarification that differences in coordination number and entropy in higher dimensions can account for the observed morphology of the phases. While it may be clear that entropy decreases due to the decrease of coordination number, our objective was to uncover how such an isotropic entropy reduction regulates the behavior of each phase driven by different interactions, which remains largely unknown. To emphasize this, we modified the introduction and have now included a discussion of the entropic contributions to phase behavior in both 2D and 3D systems, and we have made this clearer in the revised manuscript by referencing relevant theoretical frameworks. In the Discussion, we added the sentence below:

      “Generally, phase separation can be explained by the Flory-Huggins theory and its extensions: phase separation can be favored by the difference in the effective pairwise interactions in the same phase compared to those across different phases, and is disfavored by mixing entropy. The effective interactions contain various molecular interactions, including direct van der Waals and electrostatic interactions, hydrophobic interactions, and purely entropic macromolecular excluded volume interactions. For the latter, Asakura-Oosawa depletion force can drive the phase separation. Furthermore, the demixing effect was explicitly demonstrated in previous simulations and field theory (61). Importantly, we note that the effective pairwise interactions scale with the coordination number z. The coordination number is a clear and major difference between 3D and 2D systems. In 3D systems, large z allows both relatively strong few specific interactions and many weak non-specific interactions. While a single specific interaction is, by definition, stronger than a single non-specific interaction, contribution of the latter can have strong impact due to its large number. On the other hand, a smaller z in the membrane-bound 2D system limits the number of interactions. In case of limited competitive binding, specific interactions tend to be prioritized compared to non-specific ones. In fact, Fig. 3A clearly shows that number of specific interactions in 2D is similar to that in 3D, while that of non-specific interactions is dramatically reduced in 2D. In the current PSD system, CaMKII is characterized by large valency and large volume. In the 3D solution system, non-specific excluded volume interactions drive CaMKII to the outer phase, while this effect is largely reduced in 2D, resulting in the reversed multiphase.   

      Also, I sense some haste in comparing the findings with experimental observations. For example, the authors mention that "For the current four component PSD system, the product of concentrations of each molecule in the dilute phase is in good agreement with that of the experimental concentrations (Table S2)." But the data used here is the dilute phase, which is the remnant of a system prepared at very high concentrations and allowed to phase separate. The errors reported in Table S2 already cast doubt on this comparison. 

      We thank the reviewer for the insightful comment. In the validation process, we adjusted the parameters so that the number of molecules in dilute phase is consistent with the experimental lower limit of phase separation, based on the assumption that phase-separated dilute phase is the same concentration as the critical concentration. That is why we focus on comparing dilute phase concentration in Table S2. However, in our simulations, the number of protein molecules is relatively small since it is based on the average number per synapse spine. For example, there are only about 60 CaMKII molecules at most, and its presence in the dilute phase is highly sensitive to concentration, as the reviewer pointed out. This is one of the limitations, so we have added a description to the Limitations section. We added:

      “Second, parameter calibration contains some uncertainty. Previous in vitro study results used for parameter validation are at relatively high concentrations for phase separation, which may shift critical thresholds compared to that in in vivo environments. Also, since the number of molecules included in the model is small, the difference of a single molecule could result in a large error during this validation process.”

      Or while the 2D system is prepared via confining the particles to the vicinity of the membrane, the different diffusive behavior in the membrane, in contrast to the bulk (i.e., the Saffman-Delbrück model), is not considered. This would thus make it difficult to interpret the results of a coupled 2D/3D system and compare them to the actual system.

      We appreciate the reviewer’s helpful comment. We agree with that there is a concern that the Einstein-Stokes equation does not adequately reproduce the diffusion of membrane-embedded particles. We recalculated the diffusion coefficients for every membrane particle used in this model using the Saffman-Delbrück model and found that diffusion coefficients for receptor cores (AMPAR and NMDAR) were approximately three times larger. These values are still about ~10 times smaller than that of molecules diffusing under the cytoplasm. Additionally, since this study focuses on the morphology of the phase/cluster at the thermodynamic equilibrium, we think that the magnitude of the diffusion coefficient has little influence on the final structure of the cluster. However, we will incorporate the membrane-embedded diffusion as a future improvement item for better modelling and implementation. We added:

      “Third, we estimated all the diffusion coefficients from the Einstein-Stokes equation, which may oversimplify membrane-associated dynamics. Applying the Saffmann-Delbrück model to membrane-embedded particles would be desired although the resulting diffusion coefficients remain of the same order of magnitude. These limitations highlight the need for further research, yet they do not undermine the core significance of the present findings in advancing our understanding of multiphase morphologies.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers. 

      The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to

      study individual hyphae. 

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement. 

      We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.

      Reviewer #2 (Public review): 

      Summary: 

      In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology. 

      Strengths: 

      The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and

      movies are of excellent quality. 

      The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous

      fungi. 

      Weaknesses: 

      There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei. 

      We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding. 

      Our responses to each are provided below.  

      Reviewer #3 (Public review): 

      Summary: 

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis. 

      Strengths: 

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences. 

      Weaknesses: 

      There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted. 

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae. 

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation. 

      We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.

      Reviewer #1 (Recommendations for the authors): 

      I was wondering what happens if thick hyphae were taken as inoculum for a new colony or thin hyphae. Is it possible to enrich for one or the other type of hyphae? Perhaps in the presence of yeast extract or certain amino acids. 

      Added an explanation in the discussion.

      L304-306. When thick hyphae were cultured on fresh medium, thin hyphae initially emerged, suggesting that sustained metabolic activity is required for the formation of thick hyphae with a high number of nuclei.    

      L120-121. In some cases, thick hyphae emerged by branching from thick hyphae (Fig. 2D, left), while in other cases, thin hyphae emerged from thick hyphae (Fig. 2D, right). Thin hyphae emerge in the early stage of cultivation even in the presence of yeast extract or certain amino acids.

      In the Discussion, they hypothesize that the primary effect could be on cell wall rigidity. I am wondering if that hypothesis could be tested by adding, for instance, sublethal concentrations of cytochalasin to hyphae of A. nidulans to weaken the cell wall. 

      The question is reasonable. To ensure accurate understanding, we moved Fig. S6 to Fig. 6 and revised the discussion as follows. 

      L294-295. In our model, cell wall loosening at a branching site and regulation of cell volume by turgor pressure constitute necessary conditions for increasing cell volume and maintaining thick hyphae. L306-309. Weakening the cell wall by treatment with a low concentration of calcofluor white did not lead to hyphal thickening or an increase in nuclear number. On the contrary, thick hyphae have thicker cell walls (Fig. 2H-K), which are necessary to maintain the increased cell volume.

      I recommend including some older literature. It was described already 20 years ago that A. nigerdifferentiates hyphae with different capacities to secrete proteins (PMID: 16238620). In addition, there are old reports in A. nidulans reporting high numbers of nuclei (https://doi.org/10.1099/00221287-60-1-133). Perhaps it is worth trying to reproduce those cultural conditions. At least this should be discussed. In the same line, the number of nuclei increases a lot in the stalk of conidiophores in A. nidulans. These observations could be used as examples that the phenomenon observed in A. oryzae may be of general importance. 

      Thank you for the suggestion. It is a very interesting proposal. We checked the nuclei distribution of A. nidulans on the media and added the following discussion.

      L328-334. A previous study reported an increase in the number of nuclei in A. nidulans (62, 63). Here, we examined the nuclear distribution of A. nidulans grown on the culture media, however, did not find class III hyphae as observed in A. oryzae. Even in A. nidulans, conidiophore stalks contain a high number of nuclei. It has been shown that A. oryzae has a taller conidiophore stalk (64). In the thick hyphae of A. oryzae, the expression level of flbA, an early regulator of conidiophore development (65), was elevated. This suggests that differentiation to aerial hyphae may be involved in the increase of hyphal volume and nuclear number. 

      (62) Clutterbuck A.J. Synchronous Nuclear Division and Septation in Aspergillus nidulans. J Gen Microbiol 60, 133-135 (1970).

      (63) Vinck, A., Terlou, M., Pestman, W.R., Martens, E.P., Ram, A.F., van den Hondel, C.A., Wösten, H.A. Hyphal differentiation in the exploring mycelium of Aspergillus niger. Mol Microbiol 58, 693-9 (2005).

      (64) Wada R, Maruyama J, Yamaguchi H, Yamamoto N, Wagu Y, Paoletti M, Archer DB, Dyer PS, Kitamoto K. Presence and functionality of mating type genes in the supposedly asexual filamentous fungus Aspergillus oryzae. Appl Environ Microbiol 78, 2819-29 (2012).

      (65) Lee, B.N., Adams, T.H. Overexpression of flbA, an early regulator of Aspergillus asexual sporulation, leads to activation of brlA and premature initiation of development. Mol Microbiol 14, 323-34 (1994).

      Reviewer #2 (Recommendations for the authors): 

      I suggest addressing the following questions to strengthen the manuscript: 

      (1) Do the authors have an explanation for their result that with an increase in the number of nuclei the individual nucleus is smaller? Have the authors checked whether all the nuclei are haploid or diploid?

      Thank you for the very important question. We added new results to Fig. S5D and S5E and the following discussion.

      L335-340. We investigated whether the reduction in nuclear size observed in thick hyphae was due to a change from diploid to haploid status. However, no difference in GFP-histone fluorescence intensity was detected between thick and thin hyphae (Fig. S5D). In both RIB40 and RIB915 strains, no significant difference in conidial spore size was observed despite the large difference in the number of nuclei within the hyphae (Fig. S5E). These results suggest that both thick and thin hyphae remain haploid, and that the smaller nuclear size observed in thick hyphae is likely due to a higher nuclear density.

      (2) In this context, the biological relevance of the increase in the number of nuclei should also be discussed in more detail. It remains to be clarified whether in hyphae with a high number of nuclei all nuclei are functionally active or whether many nuclei are possibly "inactive". Studies on the transcriptional activity of individual nuclei or on DNA replication (e.g., by EdU labeling) could clarify this. 

      Added the explanation below.

      L102-105. The transcriptional activity of each nucleus is unknown. However, a previous study (Yasui et al., FBB 2020) demonstrated that nuclear division is synchronized even when there are more than 200 nuclei. This suggests that DNA replication occurs similarly in most nuclei. Furthermore, since the germination rate of conidia and the colonies formed from individual conidia show no significant abnormalities, it is suggested that nearly all nuclei possess normal genomes and chromosomes.

      (3) It becomes not entirely clear what the underlying signal is that causes a thin hypha to branch into a thick multinucleated cell. This needs to be discussed in more detail. 

      Thanks for the suggestion. We clarified the signal to increase nuclear number and cell volume.

      L294-309. Although it is speculative, we propose a model to aid interpretation in the discussion. We have clarified that both genetic potential and environmental signals such as nutrients are important.

      (4) Is increased branching always correlated with an increased number of nuclei? 

      It is not an increase in branching, but rather the thickening of hyphae and an increase in cell volume that is consistently associated with an increase in nuclear number. Approximately 40 hours after inoculation, within 400 μm from the tip, the number of branches was 3.4 (SD=2.4) in thin hyphae and 2.6 (SD=0.5) in thick hyphae, suggesting that branching does not increase (n=4). Since thick hyphae elongate faster, it seems that fewer branches are present near the tip, even if the branching frequency itself remains unchanged.

      (5) The abstract does not summarize the many findings of the manuscript in an adequate way. 

      abstract change

      Minor: 

      (1) Lines 49-50: Why italics? 

      corrected.

      (2) Line 179: process. 

      corrected.

      (3) Lines 313-314: Do not forget (and discuss) in this context mycorrhiza fungi with up to thousands of nuclei that were apparently selected during evolution for this high number of nuclei. 

      Thank you for the very interesting suggestion. We have added the following discussion.

      L339-351. The regulation of nuclear number and its ecological strategy are intriguing in other fungi such as N. crassa, which rapidly spreads after wildfires (68), and arbuscular mycorrhiza fungi that form symbiotic relationships with plants and contain thousands of nuclei within hyphae lacking septa (69).

      (68) Jacobson, D. J. et al. Neurospora in temperate forests of western North America. Mycologia 96, 66–74 (2004).

      (69) Kokkoris V, Stefani F, Dalpé Y, Dettman J, Corradi N. Nuclear Dynamics in the Arbuscular Mycorrhizal Fungi. Trends Plant Sci. 25, 765-778 (2020).

      (4) Lines 356-358: many typos.

      corrected.

      Reviewer #3 (Recommendations for the authors): 

      Specific suggestions or clarifications for the authors include: 

      (1) Lines 49-50: Is this sentence italicized for a reason? 

      It was a mistake, so we have corrected it.

      (2) Line 83: More detail on the specific characteristics of the different classes of hyphae would be helpful. Perhaps include a schematic drawing that emphasizes the differences between class I,II, and III hyphae. 

      L398-400. The classification is described in the Methods section: Class I – nuclei are distributed at regular intervals without overlapping; Class II – nuclei are aligned but occasionally overlap; Class III – nuclei are scattered throughout the hyphae without alignment. Representative images are shown in a previous study (Yasui et al., FBB 2020). 

      L82-84. We have added this information to clarify the classification.

      (3) Lines 102-103: It was not very clear how this experiment was done. Are you counting nuclei within 100 um of the tip? Are these all in one hyphal compartment? These details could be provided in a drawing that would make it easier for the reader to understand how this was done. 

      L109. Due to variation in the distance from the hyphal tip to the septum, we counted the number of nuclei within 100 μm from the hyphal tip. When septa were present, nuclei were counted in the same manner, so multiple compartments may be included. Changed the explanation.

      (4) Lines 134-140: Is there a way to calibrate levels of secreted protein or amylase activity per nucleus? That is, if the ratio of cytoplasmic volume per nucleus is constant, does the same apply to the secreted product? Knowing this would help to clarify whether the key feature in enhanced secretion is nuclear (e.g., gene expression) versus a cytoplasmic trait (e.g., vesicle trafficking). 

      Enzyme activity was measured across the entire mycelium, which includes a mixture of hyphae with high and low numbers of nuclei. Therefore, it is difficult to assess the correlation between enzyme activity and nuclear number. Enzyme activity was normalized by fungal biomass. The size of each colony is shown in Fig. 1B. Additionally, the correlation between the proportion of hyphae with increased nuclear number and enzyme activity is shown in Fig. 3H. In the experiment where enzyme activity was measured in a single hypha, we attempted to measure the number of nuclei; however, we could not use the nuclear GFP strain because the substrate exhibits green fluorescence. DAPI staining also failed due to limited dye access to the microfluidic channel. Changed the section title, ‘Increase in nuclear number and enzyme secretion’ from ‘Correlation between nuclear number and enzyme secretion’.

      (5) Line 151 and Figure 3F: YE also triggered a ~5-fold enhancement of secretion in A. nidulans without a concomitant increase in hyphal width. This merits some comment in the text.  

      Added an explanation, L156-157.

      In A. nidulans, the addition of yeast extract did not cause a dramatic increase in nuclear number, but hyphal width increased by 1.4-times and protein secretion increased by 5.1-times.

      (6) Line 252: Were nimE levels detected or altered in thick hyphae? The levels of this cycling might play a more important role in a shortened cell cycle than the authors have considered, especially as NimE functions during both G1 and G2. 

      Added an explanation below, L260-262.

      The expression level of nimE (AO090003000993) was low in both thick and thin hyphae, with no significant difference observed. As known in other organisms, its function is likely regulated through phosphorylation and the protein degradation.

      (7) Line 254: Please provide a citation for the statement that branches emerge as a result of cell wall loosening. 

      rephrased and added citation, L263.

      Branching is thought to occur through the degradation and reconstruction of the cell wall at the branching site (54).

      Harris SD. Branching of fungal hyphae: regulation, mechanisms and comparison with other branching systems. Mycologia 100, 823-32 (2008).   

      (8) Lines 275-277: It would be interesting to know whether the addition of rapamycin also suppressed the ability of amino acids to trigger greater numbers of class III hyphae. 

      We added new results at Fig. S2G.

      L168. Rapamycin decreased the ratio of hyphae with increased nuclei even in the medium with yeast extract (Fig. S2G).

      (9) Lines 282-289: My sense is that this model is too speculative at this time. The role of RseA seems very broad based on the strong deletion phenotype. How would the removal of RseA be regulated to limit its effect to the branch site? Also, the msyA deletion phenotype isn't entirely consistent with what you would expect if it were necessary to maintain thick hyphae. Lastly, the authors do not show that translational capacity is enhanced in thick hyphae. I would suggest that these statements be tempered to some degree. 

      Thank you for your comment. We agree that it was too speculative, whereas we believe that some explanatory interpretation is necessary. Therefore, we have revised the text as follows, L294-300. In our model, cell wall loosening during branching and regulation of cell volume by turgor pressure constitute necessary conditions for increasing cell volume and maintaining thick hyphae. RseA and MsyA may be involved in these processes. At the same time, enhanced translational capacity by increased expression of ribosomal genes, possibly due to associated with TOR activation by specific amino acids, and mechanisms that accelerate the cell cycle represent another essential condition that enables an increase in nuclear number.

      (10) General: how do the authors reconcile the observation that YE and amino acids stimulate the formation of thicker hyphae, yet the time lapse imaging (Figure 2E) suggests that these hyphae arise at a later time during colony development when these resources might be limiting? The authors should consider providing some insight into this in the Discussion. 

      L300-305. Added a discussion below.

      Both genetic potential and nutritional environmental signals are likely required for the formation of thick hyphae with a high number of nuclei. When thick hyphae were cultured on fresh medium, thin hyphae initially emerged, suggesting the necessity of sustained high metabolic activity.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature. 

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and/ or possibly Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, since we have already demonstrated enhanced protein export using multiple complementary approaches, we have chosen to address these questions in a follow-up study.

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to express our sincere gratitude to the reviewers for their thorough analysis of the manuscript and their extremely helpful comments. We have taken all the suggestions into consideration and conducted a range of additional experiments to address the points raised. We have also extensively revised the manuscript to clarify descriptions, correct inaccuracies and remove inconsistencies. We have modified the figures for clarity and content.

      Overall, we expanded the description of the EBH structure to emphasise its dimeric nature and the impact of the two binding sites on interpreting the binding data, including cooperativity. Using ITC, we tested the effect of the pre-SxIP residues on the binding affinity with additional peptides. We found that these residues had a significant effect, albeit much smaller than that of the post-SxIP residues. We analysed the binding of the 11MACF-VLL mutant with EBH-ΔC and evaluated the exchange rates. In agreement with our model, we found that the EBH affinity for the SxIP peptide from CK5P2 (KKSRLPRILIKRSR), which has a C-terminal sequence similar to that of the 11MACF-VLLRK mutant, is 21nM, which is similar to the affinity of the mutant itself. This demonstrates the significant variation in affinity observed among natural SxIP ligands, as predicted by our study. Our responses to the specific points raised by the reviewers are provided below.

      Reviewer #1 (Public Review):

      There is no direct experimental evidence for independent dock and lock steps. The model is certainly plausible given their structural data, but all titration and CEST measurements are fully consistent with a simple one-step binding mechanism. Indeed, it is acknowledged that the results for the VLL peptide are not consistent with the predictions of this model, as affinity and dissociation rates do not co-vary. The model may still be a helpful way to interpret and discuss their results, and may indeed be the correct mechanism, but this has not yet been proven.

      Unfortunately, it is not possible to obtain direct experimental evidence because the folding of the C-terminus is too fast to influence the NMR parameters. However, as the reviewer pointed out, our structural data support the two-step model, since folding of the C-terminus is only possible once the ligand containing the post-SxIP residues has bound. By adopting a mechanistically supported model, we can analyse the contributions to binding and relate them to the structural characteristics of the complex. This provides a clearer insight into the roles of the various regions in the interaction and allows to modify them rationally to enhance the ligand affinity.

      In the revised version, we restate the equations in terms of comparing the on-rates. This provides a clearer view of the effect of the additional stage, which cannot increase the overall on-rate since the two stages are sequential. If the forward rate of the second stage is comparable to or slower than the off-rate of the first stage, the overall on-rate decreases. Conversely, if the forward rate is much faster, the overall on-rate remains unchanged. For the wild-type 11MACF peptide, we observed that the presence of the EBH C-terminus does not affect the on-rate of binding, which is in perfect agreement with the two-step model and indicates that the C-terminus folds very quickly.

      Additionally, we evaluated the binding of the 11MACF-VLL mutant to EBH-ΔC and observed a twofold decrease in Kd compared to WT 11MAC, primarily due to an increase in the on-rate. Interestingly, this rate is approximately twice as low as the overall on-rate for EBH/11MACF-VLL binding, contradicting the sequential two-step model. This suggests a more complex binding process where binding is accelerated by additional hydrophobic interactions with the unfolded C-terminus. However, given the difficulty of quantifying very slow exchange rates, it is more likely that the discrepancy is due to the accuracy of the rate measurements. Therefore, the model allows the rational analysis of changes in binding parameters due to mutations.

      There is little discussion of the fact that binding occurs to EBH dimers -  either in terms of the functional significance of this or in the  acquisition and analysis of their data. There is no discussion of  cooperation in binding (or its absence), either in the analysis of NMR  titrations or in ITC measurements. Complete ITC fit results have not  been reported so it is not possible to evaluate this for oneself.

      We added information about the dimer to the introduction, emphasising its role in enhancing interaction with microtubules (MTs) and its structural role in SxIP binding. The ITC data do not exhibit any biphasic behaviour and can be fitted to a single-site model with 1:1 stoichiometry relative to the EB1c monomer. This corresponds to two independent binding sites in the dimer. We have added the stoichiometry to Table 1 and the description. The NMR titration data for the 11MACF and 11MACF-VLL interactions were fitted to the TITAN dimer model, which includes cooperativity parameters. For WT 11MACF, both cooperativity parameters were zero, corresponding to independent binding sites in the ITC model. For 11MACF-VLL, the fitting suggests weak negative cooperativity, with a ~3-fold increase in Kd for binding to the second site and no change in the off-rate. This difference in Kd is likely to be too small to induce a biphasic shape to the ITC curve. As the cooperativity effect on the NMR spectra is small and absent in the ITC, we used the independent sites model for data analysis, as there is insufficient justification for introducing extra parameters into the model. Crucially, fitting to this model did not alter the off-rate value obtained by NMR or affect the conclusions. We added a description of cooperativity to the results and discussion.

      Three peptides are used to examine the role of C-terminal residues in SxIP motifs: 4-MACF (SKIP), 6-MACF (SKIPTP), and 11-MACF (KPSKIPTPQRK). The 11-mer demonstrates the strongest binding, but this has added residues to the N-terminal as well. It has also introduced charges at both termini, further complicating the interpretation of changes in binding affinities. Given this, I do not believe the authors can reasonably attribute increased affinities solely to post-SxIP residues.

      We tested the 9MACF peptide SKIPTPQRK, which has the same N-terminus as the 4- and 6-MACF peptides, and found that its binding affinity is ~10-fold weaker than that of 11MACF. This demonstrates the contribution of both the pre- and post-SxIP residues. This is likely due to electrostatic interactions between the positively charged N-terminus and the negatively charged EBH surface, similar to those involving the positive charges at the peptide C-terminus. Although significant, the contribution of the N-terminal peptide region is approximately one order of magnitude lower than that of the post-SxIP residues, meaning the post-SxIP region is the main affinity modulator. We have added the binding data on 9MACF and a discussion of the contributions to the manuscript.

      Experimental uncertainties are, with exceptions, not reported.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      Reviewer #1 (Recommendations For The Authors):

      (1) Have you tested the binding of the WT dimer in your cell model?

      We haven’t tested the WT dimer because it has already been reported in the 2009 Cell paper by Honappa et al. In the cell experiments, our main focus was on recruiting the high-affinity mutant to MTs. The low level of recruitment, despite the mutant's high affinity, highlights the importance of dimerisation or additional contributions to binding.

      (2) Please deposit all NMR dynamics measurements (relaxation rates and derived model-free parameters) alongside structural data in the BMRB.

      The relaxation data have been submitted to BMRB, IDs 53187 and 53188

      (3) Please report complete fitting results, e.g. for ITC, including stoichiometries. Clarify what this means for binding to a dimer, and if there is any evidence of cooperativity. Figure 3C, right hand panel, shows an unusual stoichiometry, can the authors comment on this?

      We have added more information on stoichiometry and cooperativity; please refer to our response to the above comment for details. We repeated the titration for the VLLRK mutant using fresh peptide stock. As expected, the stoichiometry was close to 1:1 relative to the EB1c monomer. The new data are now included in the table and figure.

      (4) Please report uncertainties for all measurements of Kd, koff, kon, ∆G, ∆H, ∆S, and explain whether these are determined from statistical analysis, technical or biological repeats (and where reported, clarify between standard deviation/standard error). Please also be aware of standard guidelines for reporting significant figures for data with uncertainties, as these have not been followed in Table 1.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      (5) The construct design for the cell model is unclear - given the importance of flanking residues, please report and discuss how the sequences are attached to venus: which termini is attached, and what is the linker composition?

      We cloned the peptides at the C-terminus of mTFP, after the GS linker of the vector. The peptide itself contains a GS sequence at the N-terminus, creating a highly flexible GSGS linker that separates the SxIP region from mTFP and minimises the potential effect of mTFP on binding. We followed the design of Honappa et al. to enable direct comparison with the published results. We have added this information to the 'Methods' section..

      (6) Which HSQC pulse sequence was used for 2D lineshape analysis? The authors mention non-linear chemical shift changes, presumably associated with the dimer interface - this would be useful to expand upon and clarify.

      For the lineshape analysis, we used the standard Bruker sequence hsqcfpf3gpphwg with soft-pulse watergate water suppression and flip-back. This sequence is included in the TITAN model. We added the description of the non-linear chemical shift changes and connection of these changes to the allosteric effect of the binding to the supplementary information describing details of the lineshape analysis.

      (7) Figure 1A could usefully highlight the dimer interface in the surface representation also.

      We believe that including the interface would make the figure too complicated. The dimer configuration is shown in different colours for the two subunits, clearly demonstrating their involvement in forming the binding site.

      (8) Figures 1C and 1D could usefully show a secondary structure schematic to assist the reader. The x-axis in these figures is not linear and this should be corrected. The calculation of combined chemical shift perturbations should be described.

      Thank you for the helpful suggestion. We changed the scale of the figures and added the diagram of the secondary structure.

      (9) Units are missing from many figure axes.

      We added missing units to the axes. Thank you for highlighting this.

      (10) What peptide concentrations are used in Figure 1C? Presumably, these should be reported at saturation for this to be a fair comparison, this should be clarified.

      The protein concentration was 50 µM. Peptides 4MACF and 6MACF were added at a 100-fold molar excess and peptide 11MACF was added at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible for the short peptides due to their mM affinity. This information has been added to the figure legend. The figure's main aim is to illustrate the differences in the chemical shift perturbation profiles, which can be achieved even if full saturation is not attained. Although the absolute value of the chemical shifts is proportional to the degree of saturation, the distribution of the largest chemical shift changes is independent of this degree. Therefore, we can draw conclusions about the distribution of changes by comparing under non-saturation conditions.

      (11) The presentation of raw peak intensities in Figure 1D shows primarily the flexibility of the C-terminal region associated with high intensities. Beyond this, when comparing the binding of peptides it would be much more informative to show relative peak intensities. Residues around 210-225 appear to show strong broadening in the presence of peptide, but this is masked by the low initial intensity. Can the authors clarify and discuss this? Also, what peptide concentrations were used for this comparison? For a fair comparison, it should be close to saturation - particularly to exclude exchange broadening contributions.

      The protein concentration was 50 µM. 6MACF and 6MACF peptides were added at a 100-fold excess and 11MACF at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible to achieve for the short peptide due to its mM affinity. This information has been added to the figure legend. Upon checking the data, we found a small systematic offset in the coiled-coil region of some of the complexes, as the integral intensity had been used in the initial plot. While this does not change the conclusion regarding the high dynamics of the C-terminus, it does create an inaccurate perception of the relative intensities of the folded regions in the different complexes, as noted by the reviewer. We have now plotted the amplitudes at the maximum of the peaks, which do not exhibit any systematic offset as they are much less susceptible to baseline distortions. We are grateful to the reviewer for highlighting this apparent discrepancy.

      (12) Figure 2 - the scale for S2 order parameters appears to be backwards, given the caption, but its range should be indicated. Similarly, the range of values for Rex should also be indicated. These data should also be tabulated/plotted in supporting information.

      We have corrected the figure legend and added S2 and Rex plots to the supplementary material. The figure aims to highlight regions of increased mobility, while the plots provide full quantitative information on the values. We thank the reviewer for pointing out the error in the figure legend and for the suggestions regarding the plots.

      (13) The scale in Figure 3B is illegible. Indeed, the whole structure is quite small and could usefully be expanded.

      We increased the size of the structure panels and added a scale.

      (14) Figure 4 does not show a decrease in exchange rates, as per the caption - no comparison of exchange rates is shown, only thermodynamic information in panel E. Panel C shows CEST measurements, but it is not clear what system this is for - please clarify, and consider showing the comparable data for the ∆C construct for comparison.

      We have amended the figure legend to clarify that the figure shows binding parameters. We added information about the CEST profiles for the EBH/11MACF interaction to the figure legend (Figure 4C). Exchange with the ∆C construct is too fast for CEST measurements. We used lineshape analysis to evaluate the exchange rates for this construct.

      (15) The schematics shown in Figure 4D, and elsewhere, are really quite difficult to understand. They may pose additional challenges to colourblind readers. Please consider ways that this could be clarified.

      We simplified the colour scheme in the model to make the colours easier to see and to highlight SxIP and non-SxIP regions. We believe that this improved the clarity of the figure.

      (16) Figures S1D/E - the x-axes are unclear and units are missing from the y-axes.

      We re-labelled the axes to clarify the scale and units. Thank you for pointing this.

      Reviewer #2 (Public Review):

      The C-terminal tail of EB1, which is adjacent to EBH and is not analyzed in this study, is highly acidic and plays an important role in protein interactions. If the authors discuss the C-terminus of EB1, they should analyze the whole C-terminus of EB1, which would strengthen the conclusion they have made.

      Honapa et al., Cell, 2009, reported chemical shift perturbations (CSPs) on the peptide binding for the full EB1c fragment, which includes the negatively charged C-terminus. Similar to our study, they observed significant CSPs in the FVIP region but negligible CSPs at the negatively charged EEY end. They concluded that the final eight EB1c residues did not contribute to binding and used a truncated EB1c construct for their structural analysis. Building on that study, we used the same EEY-truncated construct to analyse the contribution of the C-terminus in more detail. We believe that conducting additional experiments with the full C-terminus with respect to SxIP binding would be superfluous, as it would merely replicate the findings of Honapa EA. We have added the rationale for selecting the truncated EB1c construct to the text, referencing Honapa et al.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2C: The authors can analyze the 11MACF peptide as well, to provide more assurance to their argument. It would be easier to distinguish the sequences of "SKIP" and "FVIP" by changing their colors.

      Our relaxation analysis (Fig. 2C) focuses on the dynamics of the unstructured C-terminal region in both the free and complex forms. Further relaxation analysis of the peptide would not provide additional information on this, and would be complicated by the presence of free peptide in solution.

      (2) Figure 3B: Acidic residues in EBH should be labeled.<br /> Page 6, line 11: If the authors insist that the acidic patch will influence the interactions between EB1 and the peptide, the data of the analysis using the entire EB1 C-terminus should be included, given that the C-terminal tail of EB1 is highly acidic.

      To test the contribution of charge to binding, we conducted an ITC experiment at increasing salt concentrations. We observed a significant increase in Kd values when the concentration of NaCl increased from 50 to 150 mM, which supports our conclusion regarding the significant electrostatic contribution. This conclusion is independent of the presence or absence of the C-terminus.

      As we explained earlier, Honapa et al., Cell 2009, conducted an NMR experiment on the full EB1c and observed no CPSs in the EEY region, indicating a negligible contribution from the EEY region to SxIP binding. Therefore, we think that additional experiments involving the entire C-terminus are unnecessary, as they would simply replicate the results of Honapa et al. We have added the rationale for selecting the truncated EB1c to the text, referencing Honapa et al.

      It would be very difficult to label the acidic residues without enlarging 3B considerably. However, we do not think this is necessary as we are not discussing any specific residues. The current figure shows the distribution of the surface charge, which is sufficient for our purposes.

      (3) Figure 2B (Page 4, line 27): The side chain of S5477 should be drawn. The authors should include a figure of the crystal structure of EBH and SxIP as a comparison (Honnappa et al., Cell, 2009). In their paper, Honnappa et al. performed chemical shift perturbation titrations by NMR. From their analysis, I imagine that the EB1 tail may not be critical for the EB1 C-terminus:SxIP interactions, since the signals in the tail are not significantly perturbed. The authors should cite this paper.

      We are grateful to the reviewer for highlighting this. CSP analysis of the Honapa EA revealed significant changes in the FVIP region, which we also observed. They also reported negligible CSPs at the EEY end, demonstrating that this part of the tail is non-critical and can be removed. We have added text to the manuscript to highlight the similarity between CSPs and those observed in Honapa EA. Figure 2B shows the side chains for the residues with the strongest detected contacts. These do not include S5477.

      (4) Figure 3C (ITC data): The stoichiometric ratios in the ITC data look strange. EBH vs KPSKIPVLLRKRK, is it 1:1?

      We repeated the ITC experiments using a new stock of the peptide and a new batch of the protein, checking the concentrations using UV spectroscopy. The new experiments produced a stoichiometry close to 1, as shown in the table.

      (5) Page 10, line 27: "The TPQ sequence of 11MACF is not optimal...": What is the meaning of "optimal"? The transient interaction between EB1 and its binding partner is responsible for the dynamics of the microtubule cytoskeleton. In a sense, the relatively weak interaction is "optimal" for the system. The authors should rephrase the word.

      We agree that weak interactions are optimal from a functional perspective, as they have been selected through evolution. In our case, 'optimal' refers to the hydrophobic interaction with the C-terminus. We replaced 'optimal' with 'ideal' to draw more attention to the second part of the sentence, which clarifies the context.

      (6) Page 11, line 2: "small number of comets enriched in the peptide that were too faint for the quantitative analysis, comparable to the reported previously (Honnappa, Gouveia et al. 2009)." Honnappa et al. used EGFP-fusion constructs in their study: EGFP forms a weak dimer, which presumably gave different results from the authors' mTFP-constructs. The authors can note this point in the text.

      We are grateful to the reviewer for highlighting this. This aligns well with our conclusion that dimerisation is important for localisation to comets. We have added this point to the text.

      (7) Page 10, line 21: The authors calculate the free energy of complex formation between EBH and MACF peptide and explain in the text, but it is hard to follow.

      We simplified and clarified the description of the energy contributions by focusing on the SxIP and non-SxIP regions of the peptide, as well as the EBH C-terminus.

      Minor points:

      Page 2, line 9: IP motifs are not usually located in the C-terminus. For example, SxIP in Tastin is located in the N-terminal region, and SxIPs in CLASP are in the middle.

      We corrected this statement, removing C-terminal.

      Page 3, line 4: The authors should note the residue numbers of SKIP.

      We think that in this context the residue number of the SxIP region are not important and would be distracting.

      Figure 3D and Figure S3F: Make the colors and the order the same between the two figures.

      We changed the colour scheme and the order of ITC parameters in S3F to match the main figure.

      Figure 1A, 2B, Figure S5: Change the color of SKIP from other residues in the same chain, otherwise the readers cannot distinguish. Likewise, change the color of FVIP in Figure 2B.

      We think that changing the colours will complicate the figures unnecessary. The corresponding residues are clearly labelled in the figures.

      Figure 3, Figure S5, S6, S7: Box the letters of SKIP for clarity.

      We boxed the SxIP region in S5 (new S6) and underlined in S6 (new S7). In S7 (new S8) the location of SxIP is very clear from the homology.

      Figure 3B; Figure S2: Hard to recognize the peptide (MACF in green).

      We increased the size of 3D and S2, making it easier to see the peptide.

      Figure 1C and D: Make the residual numbers of the x-axes the same between the two graphs.

      We made new plots with a linear scale for the residue numbers.

      Figure 2A: The structures shown are not EB1. It should be described as EBH or EB1(191-260 a.a.).

      Corrected.

      Page 5, line 17: "the S2 values of the C-terminus" should be "the S2 values of the C-terminal loop in EBH", otherwise it is confusing.

      Corrected.

      Page 6, line 27; Figure S3C and S6: Please indicate the assignments of the resonances from "253FVI255" in the Figures.

      We labelled the peaks corresponding to the 253FVI255 region in figure S6 (new S7). Figure S3 shows EBH-ΔC that does not include this region.

      Page 7, line 25: Figure S7 should be S8.

      Corrected

      Page 12, line 6: "sulfatrahsferases" must by a typo.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present compelling evidence that spatial trends in phenological responses to warming may differ from what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species.

      Greater phenological lag with experimental studies results in reduced sensitivity to climatic changes, not other way around.

      Strengths:

      A clearly defined and straightforward mathematical definition of phenological lag allows for this method to be applied in different scientific contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone.

      Sensitivity does not tell the magnitude of phenological changes, nor does it provide indications of mechanisms responsible for changes in spring phenology. Because of uneven warming, the same average temperature change (annual or spring temperatures) can have greater (greater warming prior to budburst) or smaller (smaller warming prior to budburst) phenological change than that with even warming. When average temperature change is close to zero, uneven warming can lead to infinite sensitivity values, either advanced (warmer temperatures prior to budburst) or delayed (cooler temperatures prior to budburst) spring phenology.

      It is not clear why sensitivity is so popularly used in phenological research.

      Identifying phenological lag and associated contributing factors provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models.

      Weaknesses:

      The authors include very few data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone.

      The use of stepwise, automated regression may be less suitable than a hypothesis-driven approach to model selection, combined with expanded data visualization. The use of stepwise regression may produce inappropriate models based on factors of the sample data that may preclude or require different variable selection.

      We used two statistical methods, variance analysis to examine differential phenological responses (Figure 2) and regression analysis to determine the relative importance of forcing change, budburst temperature, and physiological lag, the drivers of changes in spring phenology (Table 2). Our objective was to understand why plants show differential responses by research approach, species origin, climatic region, and growth form identified in previous research. Variable selection may affect minor (altitude, latitude, MAT, and average spring temperature change) or insignificant (photoperiod and long-term precipitation) variables, but not those related to drivers of spring phenology. We are not sure how hypothesis-driven approach can help with our objective.

      Reviewer #2 (Public review):

      Summary:

      This is a meta-analysis of the relative contributions of spring forcing temperature, winter chilling, photoperiod and environmental variables in explaining plant flowering and leafing phenology. The authors develop a new summary variable called phenology lag to describe why species might have different responses than predicted by spring temperature.

      Strengths:

      The summary statistic is used to make a variety of comparisons, such as between observational studies and experimental studies.

      Weaknesses:

      By combining winter chilling effects, photoperiod effects, and environmental stresses that might affect phenology, the authors create a new variable that is hard to interpret. The authors do not provide information in the abstract about new insights that this variable provides.

      Phenological lag contains effects of all constraints that may include chilling effects, photoperiod effects, and environmental stresses and is, indeed, hard to interpret without investigation of individual constraints. In our synthesis, spring phenology (or photoperiod effect) is not significant across all studies complied. It is also unlikely that lack of winter chilling causes the systemic differences in phenological lag between observational and experimental studies or between native and exotic species (see discussion at lines 335-339). At individual study level, the contribution of different constraints to the overall lag effect can be specifically determined if moisture stresses, species chilling and photoperiod effects, or cold hardiness are known from on-site monitoring or previous research.

      The meaning of phenological lag is described at lines 34-38 in the abstract.

      Comments:

      It would be useful to have a map showing the sites of the studies.

      A map showing the sites of the studies was added as supplementary Figure S1.

      The authors should provide a section in which the strengths and weaknesses of the approach are discussed. Is it possible that mixing different types of data, studies, sample sizes, number of years, experimental set-ups, and growth habits results in artifacts that influence the results?

      Both strengths and weaknesses are discussed at various places throughout the paper. The weakness of our method, as indicated by the reviewer, is the inclusion of different constraints in the phenological lag and has been described at lines 34-38 in the abstract and lines 80-86 in the introduction of the concept. We have also expanded Conclusion section to discuss possible caveats at lines 369-393.

      As in all data analyses, the results can change with addition of more/different data, especially when sample size is relatively small. Ideally, comparisons are made among levels of fixed effects while controlling variations of other conditions. In phenological studies, however, climatic, phenological, and biological conditions all vary. For example, observational and experimental studies differ not only in the nature of warming (natural climate change vs artificial warming), but also in levels of warming (greater warming with experimental studies) and climatic, phenological, and biological conditions (Table 1). All phenological syntheses (or meta-analyses) have to make do with this uncontrolled nature of phenological data.

      Now that the authors have created this new variable, phenological lag, which of the components that contribute to it has the most influence on it? Or which components are most influential in which circumstances? For example, what are some examples where photoperiod causes a phenological lag?

      Any of the phenological constraints identified can contribute alone or in combination with others to the overall effect of phenological lag. Across all studies with this synthesis, the lack of significance with spring phenology rules out photoperiod effect, while the association of longer phenological lags with longer accumulation of winter chilling does not suggest general chilling shortage with the current extent of climate change.

      Although spring phenology is not significant across all studies, photoperiod effect can be influential at individual studies where changes in spring phenology are large. However, reported photoperiod effects in the literature are mostly confounding effects with temperatures, i.e., longer photoperiods are associated with longer hours of high daytime temperatures (see Chu et al., 2021). Other than European beech under an unlikely scenario of climate change (growth resumes at beginning of winter), there has been not clear evidence showing the effect of photoperiod in constraining spring phenology.

      Another confounding effect with photoperiod is extra heating effect with artificial light sources in warming experiments. Some early studies have shown that leaf temperature can be several degrees above the ambient air, due to long-wave radiation with artificial light sources. It is hard to believe the constraining effect of photoperiod on spring phenology if phenological changes are within inter-annual variations (can be a few weeks), although photoperiod effect has been increasingly discussed recently.

      Recommendations for the authors:

      Reviewing Editor:

      A key methodological concern is the inconsistent definition of growth temperature across observations. It is calculated over the interval between the baseline phenological date and the expected date under warming - a window that varies by species, site, and treatment. This variability limits comparability across observations and may introduce circularity, as growth temperature is derived from the same modelled expectation (i.e., the expected phenological advance) that it is later used to explain.

      The term “growth temperature” has been replaced with “budburst temperature” to indicate temperatures at species events. Budburst temperature is the average temperature within the window of expected response with the warmer climate and, as indicated by the editor, varies by species, sites, and treatments. This species-specific temperature provides an opportunity to compare among species, sites, and treatments and helps explain differences in observed responses, as demonstrated in the discussion of results in this synthesis.

      Forcing change, budburst temperature, and expected response are related. High budburst temperatures are associated with smaller expected responses, which helps explain smaller observed responses with late season species and areas of warm climates that have been often attributed to chilling or photoperiod effect.

      Additionally, the use of degree days above 0 {degree sign}C as a universal metric for spring forcing oversimplifies species' temperature responses. This approach assumes not only a fixed base temperature but also a linear response to temperature accumulation, which overlooks well-established nonlinear or species-specific thermal response curves. To improve the robustness and interpretability of the phenological lag framework, we encourage the authors to consider these limitations and explore ways to test or justify these modelling assumptions more explicitly.

      The use of 0 degree base temperature may not be the best choice for some species. Except for some early work, there has been few experimental research on physiological aspects of chilling and forcing processes. A popular alternative is modelling using assumed temperature response models. As variables influencing chilling and forcing processes are not controlled, the determined base temperatures and temperature response models may be OK with the species studied under particular conditions but would be inappropriate for applications beyond. It is hard to believe that species, in a study, all have different base temperature for accumulation of spring forcing and optimum temperature for winter chilling. Apparently, this is the result of model fitting, not actual dynamics of chilling and forcing processes.

      Two base temperatures are commonly used, 0 and 5 oC, although choice is not generally justified. It is known for long time that temperatures above 0oC contribute to spring forcing. My personal experience at tree nursery suggests that seedlings will flush after winter cold storage, even at forcing temperatures ≤ 5 oC in the dark. The use of 5 oC is rather the choice of tradition (5 oC is commonly used to define growing season) than scientific justification. The use of high base temperatures may not make much difference at high temperatures due to short forcing duration but will underestimate forcing at low temperatures due to long forcing duration and large proportions of forcing between 0 and base temperatures. We are not aware of any experimental studies that demonstrate non-zero base temperatures.

      Within the dominant range of spring temperatures (e.g., between 5 and 25 oC), the forcing responses to temperatures can be approximated with linear models. Again, we are not aware of any non-linear forcing models that can be safely applied beyond the species studied under particular conditions.

      Regardless, the uses of different base temperatures or forcing models would not affect the partitioning of phenological changes, simply because temperature response models reflect physiological aspects of chilling and forcing processes and would not change with climate warming.

      The authors introduce a new metric, phenological lag, to assess how phenological constraints influence spring phenology, offering new insights into phenological research. However, there are several concerns. First, the research question and the study's aim are not clearly presented. The authors primarily analyzed phenological lag and simply compared it across different groups, but additional analyses are needed to adequately address the research question. In addition, the broader importance of this study is not clearly explained - why this research is necessary and what it contributes to the field should be explicitly stated.

      The research question is outlined at lines 92-108. We added “Our objective was to determine how phenological responses differ among different groups and how differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag” at lines 106-108.

      (1) Abstract: The methodological improvements and more key results should be included.

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. More results are added at lines 40-48.

      (2) Line 32: Terms such as "sensitivity analysis" and "phenological lag" need clearer definitions.

      We added at lines 32-33 to define sensitivity analysis “that is based on rates of phenological changes, not on drivers of spring phenology”. Phenological lag is defined at lines 34-38.

      (3) Lines 38-47: Further results and the urgency or importance of the study should be conveyed.

      More results are added at lines 40-48. The importance of this study is described at lines 48-50.

      (4) Line 57-58: This sentence is unclear - please clarify.

      The sentence is modified to “difficult using sensitivity analysis that is based on rates of phenological changes, not on drivers of spring phenology".

      (5) Line 60: break "endodormancy".

      Breaking dormancy would mean endodormancy.

      (6) Line 67: What does "growth temperature" refer to?

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. It is calculated as the average temperature within the window of expected response with the warmer climate.

      (7) Lines 87-94: The specific purpose of the study is vague. Why is this method needed, and how will it serve future research?

      We have modified the paragraph at lines 92-108 to provide justification and objective of the study.

      (8) Lines 163-164: The rationale for exploring differences in observed responses and phenological lag needs to be better justified.

      We added explanations at lines 179-182 why observed responses and phenological lag were chosen in the analysis.

      (9) Lines 178-183: Tables and figures should be properly cited within the text.

      Table S3 was added at line 197.

      (10) Lines 195-198: Clarify whether variables were scaled before model analysis.

      We clarified at line 192 “variables were not standardized prior to regression analysis”.

      (11) Line 206-207: The observed response is presented as the number of advanced days, while temperature sensitivity refers to the response of spring phenology to temperature - these are different variables and should not be conflated.

      The two variables are related but show different aspects of phenological changes. Observed response divided by average temperature change gives temperature sensitivity. Observed response is the total changes in number of days observed, while temperature sensitivity is the change in number of days per unit change in average temperature (oC). Sensitivity may reflects rates of phenological change with temperature (see responses to reviewer 1).

      (12) In the discussion section, the authors compared phenological responses among different groups separately. This section requires substantial improvement to more clearly answer the research question.

      These discussions are related to our objective “how phenological responses differ among different groups identified in previous research (i.e., research approach, species origin, climatic region, and growth form) and how these differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using an intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow, and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well-designed, and the data generated are clear. I list below a number of suggestions to enhance this important work:

      (1) Figure 1B: It is surprising that the BP circadian rhythm is not distinguishable in either group. Figure 2, however, shows differences in circadian rhythm at different timepoints during infusion. Could the authors explain the lack of circadian effect in the 24-h traces?

      The circadian rhythm pattern is apparent in Figure 2 (Active BP higher than Inactive BP), where BP is presented as 12hour averages. When the BP data is expressed as one-hour averages (rather than minute-to-minute) over 24hours, now included in the revised manuscript as Supplemental Figure 3C-D, the circadian rhythm becomes noticeable. In addition, we have included one-hour average BP data for all mice in the control and BPV groups, Supplemental Figure 3A-B.

      Notably, the Ang-II induced pulsatile BP pattern remains evident in the one-hour averages for the BPV group, Supplemental Figure 3B. To minimize bias and validate variability, pump administrations start times were randomized for both control and BPV groups, Supplemental Figure 3A-B. Despite these adjustments, the circadian rhythm profile of BP is consistently maintained across individual mice and in the collective dataset, Supplemental Figure 3C-D.

      (2) While saline infusion does not result in elevation of BP when compared to Ang II, there is an evident "and huge" BP variability in the saline group, at least 40mmHg within 1 hour. This is a significant physiological effect to take into consideration, and therefore it warrants discussion.

      Thank you for this comment. The large variations in BP in the raw traces during saline infusion reflects transient BP changes induced by movement/activity, which is now included in Figure 1B (maroon trace). The revised manuscript now includes Line 222 “Note that dynamic activity-driven BP changes were apparent during both saline- and Ang II infusions, Figure 1B”.

      (3) The decrease in DBP in the BPV group is very interesting. It is known that chronic Ang II increases cardiac hypertrophy, are there any changes to heart morphology, mass, and/or function during BPV? Can the decrease in DBP in BPV be attributed to preload dysfunction? This observation should be discussed.

      The lower DBP in the BPV group was already present at baseline, while both groups were still infused with saline, and was a difference beyond our control. However, this is an important and valid consideration, particularly considering the minimal yet significant increase in SBP within the BPV group (Figure 1D). Our goal was to induce significant transient blood pressure responses (BPV) and investigate the impact on cardiovascular and neurovascular outcomes in the absence of hypertension. We did not anticipate any major cardiac remodeling at this early time point (considering the absence of overt hypertension) and thus cardiac remodeling was not assessed and this is now discussed in the revised manuscript (Line 443-453).

      (4) Examining the baroreceptor reflex during the early and late phases of BPV is quite compelling. Figures 3D and 3E clearly delineate the differences between the two phases. For clarity, I would recommend plotting the data as is shown in panels D and E, rather than showing the mathematical ratio. Alternatively, plotting the correlation of ∆HR to ∆SBP and analyzing the slopes might be more digestible to the reader. The impairment in baroreceptor reflex in the BPV during high BP is clear, is there any indication whether this response might be due to loss of sympathetic or gain of parasympathetic response based on the model used?

      We appreciate the reviewer’s suggestion and have accordingly generated new figures displaying scatter plots of SBP vs HR with linear regression analysis (Figure 3D-G). Our goal is to further investigate which branch of the autonomic nervous system is affected in this model. The loss of a bradycardic response suggests either an enhancement of sympathetic activity, a reduction in parasympathetic activity, or a combination of both. This is briefly discussed in the revised manuscript (Line 486-496).

      Heart rate variability (HRV) serves as an index of neurocardiac function and dynamic, non-linear autonomic nervous system processes, as described in Shaffer and Ginsber[1]. However, given that our data was limited to BP and HR readings collected at one-minute intervals, our primary assessment of autonomic function is limited to the bradycardic response. Further studies will be necessary to fully characterize the autonomic parameters influenced by chronic BPV.

      (5) Figure 3B shows a drop in HR when the pump is ON irrespective of treatment (i.e., independent of BP changes). What is the underlying mechanism?

      We apologize for any lack of clarity. These observed heart rate (HR) changes occurred during Ang II infusion, when blood pressure (BP) was actively increasing. In the control group, the pump solution was switched to Ang II during specific periods (days 3-5 and 21-25 of the treatment protocol) to induce BP elevations and a baroreceptor response, allowing direct comparisons between the control and BPV group.

      To clarify this point, we have revised Line 260-263 of the manuscript: “To compare pressure-induced bradycardic responses between BPV and control mice at both early and later treatment stages, a cohort of control mice received Ang II infusion on days 3-5 (early phase) (Supplemental Figure 4) and days 21-25 (late phase) thereby transiently increasing BP”.

      Additionally, a detailed description has been added to the Methods section (Line 96-101): “Controls receiving Ang II: To facilitate between-group comparisons (control vs BPV), a separate cohort of control mice were subjected to the same pump infusion parameters as BPV mice but for a brief period receiving Ang II infusions on days 3-5 and 21-25 for experiments assessing pressure-evoked responses, including bradycardic reflex, myogenic response, and functional hyperemia at high BP.”

      (6) The correlation of ∆diameter vs MAP during low and high BP is compelling, and the shift in the cerebral autoregulation curve is also a good observation. I would strongly recommend that the authors include a schematic showing the working hypothesis that depicts the shift of the curve during BPV.

      Thank you for this insightful comment. The increase in vessel reactivity to BP elevations in parenchymal arterioles of BPV mice suggests that chronic BPV induces a leftward shift and a potential narrowing of the cerebral autoregulation range (lower BP thresholds for both the upper and lower limits of autoregulation). This has been incorporated (and discussed) into the revised manuscript (see Figure 5N).

      One potential explanation for these changes is that the absence of sustained hypertension, a prominent feature in most rodent models of hypertension, limits adaptive processes that protect the cerebral microcirculation from large BP fluctuations (e.g., vascular remodeling). While this study does not specifically address arteriole remodeling, the lack of such adaptation may reduce pressure buffering by upstream arterioles, thereby rendering the microcirculation more vulnerable to significant BP fluctuations.

      The unique model allows for measurements of parenchymal arteriole reactivity to acute dynamic changes in BP (both an increase and decrease in MAP). Our findings indicate that chronic BPV enhances the reactivity of parenchymal arterioles to BP changes—both during an increase in BP and upon its return to baseline, Supplemental Figure 5C, F. The data suggest an increased myogenic response to pressure elevation, indicative of heightened contractility, a common adaptive process observed in rodent models of hypertension[2-4]. However, our model also reveals a notable tendency for greater dilation when the BP drops, Supplemental Figure 5F. This intriguing observation may suggest ischemia during the vasoconstriction phase (at higher BP), leading to enhanced release of dilatory signals, which subsequently manifest as a greater dilation upon BP reduction. This phenomenon bears similarities to chronic hypoperfusion models[5,6], where vasodilatory mechanisms become more pronounced in response to sustained ischemic conditions. Future studies investigating the effects of BPV on myogenic responses and brain perfusion will be a priority for our ongoing research.

      (7) Functional hyperemia impairment in the BPV group is clear and well-described. Pairing this response with the kinetics of the recovery phase is an interesting observation. I suggest elaborating on why BPV group exerts lower responses and how this links to the rapid decline during recovery.

      Based on the heightened reactivity of BPV parenchymal arterioles to intravascular pressure (Figure 5), we anticipate that the reduction of sensory-evoked dilations results from an increased vasoconstrictive activity and/or a decreased availability of vasodilatory signaling pathways (NO, EETs, COX-derived prostaglandins)[7,8]. Consequently, the magnitude of the FH response is blunted during periods of elevated BP in BPV mice.

      Additionally, upon termination of the stimulus-induced response−when vasodilatory signals would typically dominate−vasoconstrictive mechanisms are rapidly engaged (or unmasked), leading to quicker return to baseline. This shift in the balance between vasodilatory and vasoconstrictive forces favors vasoconstriction, contributing to the altered recovery kinetics observed in BPV mice. This has been included in the Discussion section of the revised manuscript.

      (8) The experimental design for the cognitive/behavioral assessment is clear and it is a reasonable experiment based on previous results. However, the discussion associated with these results falls short. I recommend that the authors describe the rationale to assess recognition memory, short-term spatial memory, and mice activity, and explain why these outcomes are relevant in the BPV context. Are there other studies that support these findings? The authors discussed that no changes in alternation might be due to the age of the mice, which could already exhibit cognitive deficits. In this line of thought, what is the primary contributor to behavioral impairment? I think that this sentence weakens the conclusion on BPV impairing cognitive function and might even imply that age per se might be the factor that modulates the various physiological outcomes observed here. I recommend clarifying this section in the discussion.

      We thank the reviewer for this comment. Clinical studies have demonstrated that patients with elevated BPV exhibit impairments across multiple cognitive domains, including declines in processing speed[9] and episodic memory[10]. To evaluate memory function, we utilized behavioral tests: the novel object recognition (NOR) task to assess episodic memory[11] and the spontaneous Y-maze to evaluate short-term spatial memory[12].

      Previous research indicates that older C57Bl6 mice (14-month-old) exhibit cognitive deficits compared to younger counterparts (4- and 9-month-old)[13]. To ensure rigorous selection for behavioral testing, we conducted preliminary NOR assessment, evaluating recognition memory at the one-hour delay but observing failures at the four-, and 24-hour delays, indicating age-related deficits. Based on these results, animals failing recognition criteria were excluded from subsequent behavioral assessment. However, because no baseline cognitive testing was conducted for the spontaneous Y-maze, it is possible that some mice with aged-related deficits were included in this test, which may have influenced data interpretation.

      Additionally, the absence of differences in the Y-maze performance may suggest that short-term spatial memory remains intact following 25 days of BPV, a point that is now discussed in the revised manuscript.

      (9) Why were only male mice used?

      We appreciate this comment and acknowledge the importance of conducting experiments in both male and female mice. Studies involving female mice are currently ongoing, with telemetry data collection approximately halfway completed and two-photon imaging studies on functional hyperemia also partially completed. However, using middleaged mice for these experiments has proven challenging due to high mortality rates following telemetry surgeries. As a result, we initially limited our first cohort to male mice.

      (10) In the results for Figure 3: "Ang II evoked significant increases in SBP in both control and BPV groups;...". Also, in the figure legend: "B. Five-minute average HR when the pump is OFF or ON (infusing Ang II) for control and BPV groups...." The authors should clarify this as the methods do not state a control group that receives Ang II.

      Please refer to response to comment 5.

      Reviewer #2 (Public review):

      Summary:

      Blood pressure variability has been identified as an important risk factor for dementia. However, there are no established animal models to study the molecular mechanisms of increased blood pressure variability. In this manuscript, the authors present a novel mouse model of elevated BPV produced by pulsatile infusions of high-dose angiotensin II (3.1ug/hour) in middle-aged male mice. Using elegant methodology, including direct blood pressure measurement by telemetry, programmable infusion pumps, in vivo two-photon microscopy, and neurobehavioral tests, the authors show that this BPV model resulted in a blunted bradycardic response and cognitive deficits, enhanced myogenic response in parenchymal arterioles, and a loss of the pressure-evoked increase in functional hyperemia to whisker stimulation.

      Strengths:

      As the presentation of the first model of increased blood pressure variability, this manuscript establishes a method for assessing molecular mechanisms. The state-of-the-art methodology and robust data analysis provide convincing evidence that increased blood pressure variability impacts brain health.

      Weaknesses:

      One major drawback is that there is no comparison with another pressor agent (such as phenylephrine); therefore, it is not possible to conclude whether the observed effects are a result of increased blood pressure variability or caused by direct actions of Ang II.

      We acknowledge this limitation and have attempted to address the concern by introducing an alternative vasopressor, norepinephrine (NE), Figure 4. A subcutaneous dose of 45 µg/kg/min was titrated to match Ang II-induced transient BP pulse (Systolic BP ~150-180 mmHg), Figure 4A. Similar to Ang II treated mice, NE-treated mice exhibited no significant changes in average mean arterial pressure (MAP) throughout the 20-day treatment period (Figure 4B). Although there was a trend (P=0.08) towards increased average real variability (ARV) (Figure 4C left), it did not reach statistical significance. The coefficient of variation (CV) (Figure 4C right) was significantly increased by day 3-4 of treatment (P=0.02).

      Notably, unlike the bradycardic response observed during Ang II-induced BP elevations, NE infusions elicited a tachycardic response (Figure 4A), likely due to β-1 adrenergic receptor activation. However, significant mortality was observed within the NE cohort: three of six mice died prematurely during the second week of treatment, and two additional mice required euthanasia on days 18 and 20 due to lethargy, impaired mobility, and tachypnea.

      While we recognize the importance of comparing results across vasopressors, further investigation using additional vasopressors would require a dedicated study, as each agent may induce distinct off-target effects, potentially generating unique animal models. Alternatively, a mechanical approach−such as implanting a tethered intra-aortic balloon[14] connected to a syringe pump−could be explored to modulate blood pressure variability without pharmacological intervention. However, such an approach falls beyond the scope of the present study.

      Ang II is known to have direct actions on cerebrovascular reactivity, neuronal function, and learning and memory. Given that Ang II is increased in only 15% of human hypertensive patients (and an even lower percentage of non-hypertensive), the clinical relevance is diminished. Nonetheless, this is an important study establishing the first mouse model of increased BPV.

      We agree that high Ang II levels are not a predominant cause of hypertension in humans, which is why it is critical that our pulsatile Ang II dosing did not cause overt hypertension, (no increase in 24-hour MAP). Ang II was solely a tool to produce controlled, transient increases in BP to yield a significant increase in BPV.

      Regarding BPV specifically, prior studies indicate that primary hypertensive patients with elevated urinary angiotensinogen-to-creatinine ratio exhibit significantly higher mean 24-hour systolic ARV compared to those with lower ratios[15]. However, the fundamental mechanisms driving these harmful increases in BPV remain poorly defined. A central theme across clinical BPV studies is impaired arterial stiffness, which has been proposed to contribute to BPV through reduced arterial compliance and diminished baroreflex sensitivity. Moreover, increased BPV can exert mechanical stress on arterial walls, leading to arterial remodeling and stiffness−ultimately perpetuating a detrimental feed-forward cycle[16].

      In our model, male BPV mice exhibited a minimal yet significant elevation in SBP without corresponding increases in DBP, potentially reflecting isolated systolic hypertension, which is strongly associated with arterial stiffness[17,18]. Our initial goal was to establish controlled rapid fluctuations in BP, and Ang II was selected as the pressor due to its potent vasoconstrictive properties and short half-life[19].

      We appreciate the reviewer’s insightful comment and acknowledge the necessity of exploring alternative mechanisms underlying BPV, and independent of Ang II. It is our long-term goal to investigate these factors in further studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) How was the dose of Ang II determined? It seems that this dose (3.1ug/hr) is quite high.

      The Ang II dose was titrated in a preliminary study to one that induced a significant and transient BP response without increasing 24-hour blood pressure (i.e. no hypertension).

      Ang II was delivered subcutaneously at 3.1 μg/hr, a concentration comparable to high-dose Ang II administration via mini-osmotic pumps (~1700 ng/kg/min)[20], with one-hour pulses occurring every 3-4 hours. With 6 pulses per day, the total daily dose equates to 18.6 µg/day in a ~30 gram mouse.

      For comparison, if the same 18.6 µg/day dose were administered continuously via a mini-osmotic pump (18.6 µg/0.03kg/1440min), the resulting dosage would be approximately 431 ng/kg/min[21,22], aligning with subpressor dose levels. Thus, while the total dose may appear high, it is not delivered in a constant manner but rather intermittently, allowing for controlled, rapid variations in blood pressure.

      (2) Were behavioral studies performed on the same mice that were individually housed? Individual housing causes significant stress in mice that can affect learning and memory tasks (PMC6709207). It's not a huge issue since the control mice would have been housed the same way, but it is something that could be mentioned in the discussion section.

      Behavioral studies were performed on mice that were individually housed following the telemetry surgery. The study was started once BP levels stabilized, as mice required several days to achieve hemodynamic stability post-surgery. Consequently, all mice were individually housed for several days before undergoing behavioral assessment.

      To account for potential cognitive variability, earlier novel object recognition (NOR) tests were conducted to established cognitive capacity, and mice that did not meet criteria were excluded from further behavioral testing. However, we acknowledge that individual housing induces stress, which can influence learning and memory, and this is a factor we were unable to fully control. Given that both experimental and control groups experienced the same housing conditions, this stress effect should be comparable across cohorts. A discussion on this limitation is now included in the text.

      (3) It looks like one control mouse that was included in both Figures 1 and 2 (control n=12) but was excluded in Table 1 (control n=11), this isn't mentioned in the text - please include the exclusion criteria in the manuscript.

      We apologize for the typo−12 control animals were consistently utilized across Figure 1-2, Table 1, Supplemental Table 1, Figure 6C, and Supplemental Figure 2B. Since the initial submission, one control mouse was completed and included into the telemetry control cohort. Thus, in the updated manuscript, we have corrected the control sample size to 13 mice across these figures ensuring consistency.

      Additionally, exclusion criteria have now been explicitly included in the manuscript (Line 173-175). Mice were excluded from the study if they died prematurely (died prior to treatment onset) or mice exhibited abnormally elevated pressure while receiving saline, likely due to complications from telemetry surgery.

      (4) Please include a statement on why female mice were not included in this study.

      As discussed in our response to Reviewer #1, our initial intention was to include both male and female mice in this study. However, high mortality rates following telemetry surgeries significantly constrained our ability to advance all aspects of the study. As a result, we limited our first cohort to males to establish the basics of the model. A statement is now included in the manuscript, Line 50-53: “Female mice were not included in the present study due to high post-surgery mortality observed in 12-14-month-old mice following complex procedures. To minimized confounding effects of differential survival and to establish foundational data for this model, we restricted the investigation to male mice.”

      Potential sex differences might be complex and warrants a separate future research to comprehensively assess sex as a biological variable, which are currently ongoing.

      (5) On page 14, "experiments from control vs experimental mice were not equally conducted in the same season raising the possibility for a seasonal effect" - does this mean that control experiments were not conducted at the same time as the Ang II infusions in BPV mice? This has huge implications on whether the effects observed are induced by treatment or just batch seasonal effects.

      We fully acknowledge the reviewer’s concern, and our statement aims to provide transparency regarding the study’s limitations. Several challenges contributed to this outcome, including high mortality rates following surgeries (primarily telemetry implantation) and technical issues related to instrumentation, particularly telemetry functionality.

      Differences between BPV and saline mice emerge primarily due to mortality or telemetry failures−some mice did not survive post-surgery, while others remain healthy but had non-functional telemeters. This issue was particularly pronounced in 14-month-old mice, as their fragile vasculature occasionally prevented proper BP readings.

      Each experiment required a minimum of two and a half months per mouse to complete, with a cost (also per mouse) exceeding $1500 USD ($300 pump, $175 mouse, $900 telemeters, per diem, drugs, reagents etc.). Despite our best effort to ensure comparable seasonal/batch data, these logistical and technical constraints prevented perfect synchronization.

      To evaluate whether seasonal differences influenced our results, we incorporated additional telemetry data into the control cohort. Of the seven included control mice, six underwent the same treatment but were allocated to a separate branch of the study, which endpoints did not require a chronic cranial window. We found no significant differences in 24-hour average MAP during the baseline period between control mice with or without a cranial window, Supplemental Figure 2A. Additionally, we grouped mice into seasonal categories based on Georgia’s climate: “Spring-Summer” (May-September) and “Fall-Winter” (October-April) but observed no BP differences between these periods, Supplemental Figure 2B.

      Given the absence of seasonal effects on BP and the fact that mice were sourced from two independent suppliers (Jackson Laboratory and NIA), we anticipate that the observed results are driven by treatment rather than seasonal or batch effects.

      (6) Methods, two-photon imaging: did the authors mean "retro-orbital" instead of "intra-orbital" injection of the Texas red dye? Also, is this a Texas red-dextran? If so, what molecular weight?

      Thank you for this comment. The correct terminology is “retro-orbital” rather than “intra-orbital” injection. Additionally, we utilized Texas Red-dextran (70 kDa, 5% [wt/vol] in saline) for the imaging experiments. These details have now been incorporated into the Methods section.

      (1) Shaffer F, Ginsberg JP. An Overview of Heart Rate Variability Metrics and Norms. Front Public Health. 2017;5:258. doi: 10.3389/fpubh.2017.00258

      (2) Pires PW, Jackson WF, Dorrance AM. Regulation of myogenic tone and structure of parenchymal arterioles by hypertension and the mineralocorticoid receptor. Am J Physiol Heart Circ Physiol. 2015;309:H127-136. doi: 10.1152/ajpheart.00168.2015

      (3) Iddings JA, Kim KJ, Zhou Y, Higashimori H, Filosa JA. Enhanced parenchymal arteriole tone and astrocyte signaling protect neurovascular coupling mediated parenchymal arteriole vasodilation in the spontaneously hypertensive rat. J Cereb Blood Flow Metab. 2015;35:1127-1136. doi: 10.1038/jcbfm.2015.31

      (4) Diaz JR, Kim KJ, Brands MW, Filosa JA. Augmented astrocyte microdomain Ca(2+) dynamics and parenchymal arteriole tone in angiotensin II-infused hypertensive mice. Glia. 2019;67:551-565. doi: 10.1002/glia.23564

      (5) Kim KJ, Diaz JR, Presa JL, Muller PR, Brands MW, Khan MB, Hess DC, Althammer F, Stern JE, Filosa JA. Decreased parenchymal arteriolar tone uncouples vessel-to-neuronal communication in a mouse model of vascular cognitive impairment. GeroScience. 2021. doi: 10.1007/s11357-020-00305-x

      (6) Chan SL, Nelson MT, Cipolla MJ. Transient receptor potential vanilloid-4 channels are involved in diminished myogenic tone in brain parenchymal arterioles in response to chronic hypoperfusion in mice. Acta Physiol (Oxf). 2019;225:e13181. doi: 10.1111/apha.13181

      (7) Tarantini S, Hertelendy P, Tucsek Z, Valcarcel-Ares MN, Smith N, Menyhart A, Farkas E, Hodges EL, Towner R, Deak F, et al. Pharmacologically-induced neurovascular uncoupling is associated with cognitive impairment in mice. J Cereb Blood Flow Metab. 2015;35:1871-1881. doi: 10.1038/jcbfm.2015.162

      (8) Ma J, Ayata C, Huang PL, Fishman MC, Moskowitz MA. Regional cerebral blood flow response to vibrissal stimulation in mice lacking type I NOS gene expression. Am J Physiol. 1996;270:H1085-1090. doi: 10.1152/ajpheart.1996.270.3.H1085

      (9) Sible IJ, Nation DA. Blood Pressure Variability and Cognitive Decline: A Post Hoc Analysis of the SPRINT MIND Trial. Am J Hypertens. 2023;36:168-175. doi: 10.1093/ajh/hpac128

      (10) Epstein NU, Lane KA, Farlow MR, Risacher SL, Saykin AJ, Gao S. Cognitive dysfunction and greater visit-to-visit systolic blood pressure variability. Journal of the American Geriatrics Society. 2013;61:2168-2173. doi: 10.1111/jgs.12542

      (11) Antunes M, Biala G. The novel object recognition memory: neurobiology, test procedure, and its modifications. Cognitive processing. 2012;13:93-110. doi: 10.1007/s10339-011-0430-z

      (12) Kraeuter AK, Guest PC, Sarnyai Z. The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice. Methods Mol Biol. 2019;1916:105-111. doi: 10.1007/978-1-4939-8994-2_10

      (13) Singhal G, Morgan J, Jawahar MC, Corrigan F, Jaehne EJ, Toben C, Breen J, Pederson SM, Manavis J, Hannan AJ, et al. Effects of aging on the motor, cognitive and affective behaviors, neuroimmune responses and hippocampal gene expression. Behav Brain Res. 2020;383:112501. doi: 10.1016/j.bbr.2020.112501

      (14) Tediashvili G, Wang D, Reichenspurner H, Deuse T, Schrepfer S. Balloon-based Injury to Induce Myointimal Hyperplasia in the Mouse Abdominal Aorta. J Vis Exp. 2018. doi: 10.3791/56477

      (15) Ozkayar N, Dede F, Akyel F, Yildirim T, Ates I, Turhan T, Altun B. Relationship between blood pressure variability and renal activity of the renin-angiotensin system. J Hum Hypertens. 2016;30:297-302. doi: 10.1038/jhh.2015.71

      (16) Kajikawa M, Higashi Y. Blood pressure variability and arterial stiffness: the chicken or the egg? Hypertens Res. 2024;47:1223-1224. doi: 10.1038/s41440-024-01589-8

      (17) Laurent S, Boutouyrie P. Arterial Stiffness and Hypertension in the Elderly. Front Cardiovasc Med. 2020;7:544302. doi: 10.3389/fcvm.2020.544302

      (18) Wallace SM, Yasmin, McEniery CM, Maki-Petaja KM, Booth AD, Cockcroft JR, Wilkinson IB. Isolated systolic hypertension is characterized by increased aortic stiffness and endothelial dysfunction. Hypertension. 2007;50:228-233. doi: 10.1161/HYPERTENSIONAHA.107.089391

      (19) Al-Merani SA, Brooks DP, Chapman BJ, Munday KA. The half-lives of angiotensin II, angiotensin II-amide, angiotensin III, Sar1-Ala8-angiotensin II and renin in the circulatory system of the rat. J Physiol. 1978;278:471490. doi: 10.1113/jphysiol.1978.sp012318

      (20) Zimmerman MC, Lazartigues E, Sharma RV, Davisson RL. Hypertension caused by angiotensin II infusion involves increased superoxide production in the central nervous system. Circ Res. 2004;95:210-216. doi: 10.1161/01.RES.0000135483.12297.e4

      (21) Gonzalez-Villalobos RA, Seth DM, Satou R, Horton H, Ohashi N, Miyata K, Katsurada A, Tran DV, Kobori H, Navar LG. Intrarenal angiotensin II and angiotensinogen augmentation in chronic angiotensin II-infused mice. Am J Physiol Renal Physiol. 2008;295:F772-779. doi: 10.1152/ajprenal.00019.2008

      (22) Nakagawa P, Nair AR, Agbor LN, Gomez J, Wu J, Zhang SY, Lu KT, Morgan DA, Rahmouni K, Grobe JL, et al. Increased Susceptibility of Mice Lacking Renin-b to Angiotensin II-Induced Organ Damage. Hypertension. 2020;76:468-477. doi: 10.1161/HYPERTENSIONAHA.120.14972

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, especially in layer 6b, but indeed some expression is seen in layer 6a and subcortically. We will nuance our claims throughout the paper to ensure that the conclusions are supported by our findings, and further discuss the impact of this limitation on the overall interpretation of our results. Specifically, we will discuss the potential contribution of relevant subcortical areas and layer 6a in the effects we observed.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024), which validates our approach to “silence” cortical neurons. We will discuss this further in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for spotting the inconsistencies in how the statistical comparisons were presented: indeed, in the text we described two-way ANOVAs with posthoc tests but in the figures significance markers were positioned based on multiple t-tests. We have revised Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs in Figures 6 and supplementary figures S5 and S6. 

      We thank the reviewer for pointing out that in our comparisons of EEG spectra, in some cases single isolated frequency bins, where p-value reached 0.05 were shown as significantly different, which indeed could have occurred by chance given that, in line with previous literature, we have not employed multiple testing comparison. In the revised manuscript we will use an unbiased approach by plotting actual p-values for all bins, and moderate our conclusions accordingly, while giving the readers the opportunity to evaluate the magnitude and extent of the differences directly, rather than relying on an arbitrary threshold for significance.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We will add this information to methods when revising the manuscript.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      Thank you for your careful observations, these issues reflect the same inconsistency as raise above, where the text describes two-way ANOVAs and the figures refers to results obtained with multiple t tests. We shall adjust the markers in the figures to be only shown when the ANOVA is significant and show the results of posthoc tests after ANOVAs instead of the results of multiple t tests.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We will adjust the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      We have added the statistical comparisons for Figure 3e to the results section.

      We have added the statistical comparisons for Figure S7A to the results section.

      We have added the statistical comparison for Figure S7b to the results section.

      In Figure S7c, there was an overall genotype difference, but there was not a time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer.

      We have adjusted the reference to the figure S7c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We shall change the subtitle to: “The effects of orexin on vigilance states in L6b silenced mice”. The main finding described in this section is that the increase in EEG theta frequency after ORXB infusion is attenuated in L6b silenced mice, so a statement summarizing this finding could be an alternative title. However, then it would not accurately reflect other, less conspicuous, yet potentially important findings described in this section (during NREM sleep, only in L6b silenced animals there is an increase in power in the lower frequency bins in the frontal derivation; in the occipital derivation, levels of relative SWA during NREM sleep after ORXA infusion were lower in L6b silenced than in control animals).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We completely agree, and did not want to imply that orexin administered through the ICV route reaches cortical Drd1a Cre expressing neurons only. We will re-word the corresponding sentences accordingly throughout the manuscript.

      (2) The rationale for using only male rats is not provided.

      We agree that this is an important limitation and will acknowledge and discuss it further in the revised manuscript. Unfortunately, our experimental protocol precluded the possibility of monitoring accurately the oestrous cycle, which as well-known has an influence on sleep-wake architecture, brain oscillations as well as orexin signalling and receptor abundance. We therefore decided to use male mice only for the current study, but planning to use both sexes in our follow up work.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using innovative imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. The authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using existing methods.

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:

      The strength of evidence remains incomplete because of the main claim that synchronous events are not associated with ripples. As was mentioned in previous rounds of review, ripples emerge locally and independently in the two hemispheres. Thus, obtaining ripple recordings from the contralateral hemisphere does not provide solid evidence for this claim. The papers the authors are citing to make the claim that "Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations, which are known to co-occur across hemispheres (29-31)" do not support this claim. For example, reference 29 contains the following statement: "These findings suggest that ripples emerge locally and independently in the two hemispheres".

      In our previous revisions, we took care to limit our claim to what our data directly supported: that synchronous ensembles of CA1 neurons were not associated with ripple oscillations recorded in the contralateral hippocampus. To address reviewer concerns, we changed the Title, modified the Abstract, adjusted relevant text in the Results, and explicitly acknowledged the methodological limitations in the Discussion. 

      In this round, we further revised the manuscript to directly address the editor’s and reviewer’s remaining concerns: 

      (1) We replaced the word “surprisingly” with a more neutral “Moreover” to avoid implying that the observed dissociation was unexpected given the use of contralateral recordings.

      Introduction (line 67-69):

      “Moreover, these synchronous ensembles occurred outside of contralateral ripples (c-ripples) …”

      (2) We removed the clause stating that ripples “co-occur across hemispheres”, along with the associated citation to Buzsaki et al. (2003), to avoid potential misinterpretation. The sentence now simply states that we recorded ripple and theta oscillations in the contralateral CA1.

      Introduction (line 63-64):

      “Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations.” (co-occurrence claim removed)

      (3) We carefully replaced all mentions of “ripples” in the manuscript with “c-ripples” (i.e., contralateral ripples) to ensure that the scope of our findings is clearly defined and cannot be misinterpreted.

      (4) We strengthened the acknowledgment of the methodological limitations in the Discussion. 

      Discussion (line 528-533): 

      “While contralateral LFP recordings can capture large-scale hippocampal theta and ripple oscillations, they do not fully reflect ipsilateral-specific dynamics, such as variation in theta phase alignment or locally generated ripple events (Buzsaki et al., 2003; Szabo et al., 2022; Huang et al., 2024). Given that ripple oscillations can emerge locally and independently in each hemisphere, interpretations based on contralateral recordings must be made with caution. Further studies incorporating simultaneous ipsilateral field potential recordings will be essential to more precisely understand local-global network interactions.”

      These revisions ensure that our manuscript now presents a consistent and appropriately limited interpretation across all sections. We hope these clarifications address all remaining concerns and accurately reflect the scope of our findings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).  

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      (1) We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, but we did not provide a more detailed introduction in the introduction section. We will address and improve this in the revised manuscript. (2) We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      (2) As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      (3) In addition, for verification at the protein level, we have recently conducted some experiments and will include these results in the revised version.

      (5) Lastly, regarding our electroacupuncture intervention model, we actually conducted a series of parameter optimization experiments during the preliminary exploration phase. This part is indeed lacking in our current introduction, and we will add it to the research background and introduction.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.<br /> (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      (1) It was indeed our mistake that we did not pay attention to the importance of research background factors such as the degree, timing, or regional specificity of BBB opening for the rationale and purpose of this experimental design. In our revision, we will thoroughly elaborate on the relevant previous studies.

      (2) Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      (3) Thank you very much for bringing this to our attention. We will include the key results of the receptor-ligand signaling and cell-cell communication analysis in the main manuscript.

      (4) Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

    1. Author response:

      We thank the reviewers for their thoughtful comments, and we plan to implement many of their suggestions to improve the paper. We agree that the paper can benefit from clearer links between the two neural signatures (memory traces and uniform shifts) themselves, and between the neural signatures and behavioral phenomena. We will address these limitations in multiple ways. First, as the reviewers noted, RNN models have the potential to probe these relationships, so we plan to perform further analyses and modeling experiments to uncover any causal relationships. Second, we will also establish clearer definitions of the neural signatures and explore how these signatures can be unified using our models. Finally, we will compare the experimental paradigms between Losey et al and Sun, O’Shea et al, and discuss how differences between the paradigms may have impacted our observations, particularly in the context of other experimental and modeling papers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bacterial species that frequently undergo horizontal gene transfer events tend to have genomes that approach linkage equilibrium, making it challenging to analyze population structure and establish the relationships between isolates. To overcome this problem, researchers have established several effective schemes for analyzing N. gonorrhoeae isolates, including MLST and NG-STAR. This report shows that Life Identification Number (LIN) Codes provide for a robust and improved discrimination between different N. gonorrhoeae isolates.

      Strengths:

      The description of the system is clear, the analysis is convincing, and the comparisons to other methods show the improvements offered by LIN Codes.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for their assessment of our paper.

      Reviewer #2 (Public review):

      Summary:

      This paper describes a new approach for analyzing genome sequences.

      Strengths:

      The work was performed with great rigor and provides much greater insights than earlier classification systems.

      Weaknesses:

      A minor weakness is that the clinical application of LIN coding could be articulated in a more in-depth way. The LIN coding system is very impressive and is certainly superior to other protocols. My recommendation, although not necessary for this paper, is that the authors expand their analysis to noncoding sequences, especially those upstream of open reading frames. In this respect, important cis-acting regulatory mutations that might help to further distinguish strains could be identified.

      We thank the reviewer for their comments. LIN code could be applied clinically, for example in the analysis of antibiotic resistant isolates, or to investigate outbreaks associated with a particular lineage. We will update the text to describe this more thoroughly.

      In regards to non-coding sequences: unfortunately, intergenic regions are generally unsuitable for use in typing systems as (i) they are subject to phase variation, which can occlude relationships based on descent; (ii) they are inherently difficult to assemble and therefore can introduce variation due to the sequencing procedure rather than biology. For the type of variant typing that LIN code represents, which aims to replicate phylogenetic clustering, protein encoding sequences are the best choice for convenience, stability, and accuracy. This is not to say that it is not a valid object to base a nomenclature on intergenic regions, which might be especially suitable for predicting some phenotypic characters, but this will still be subject to problem (ii), depending on the sequencing technology used.  Such a nomenclature system should stand beside, rather than be combined with or used in place of, phylogenetic typing. However, we could certainly investigate the relationship between an isolates LIN code and regulatory mutations in the future.

      Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhea, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due n to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

      We thank the reviewer for their insightful comments. LIN codes do use multi-level single linkage clustering to define the cluster number of isolates. However, unlike previous applications of simple single linkage clustering such as N. gonorrhoeae core genome groups (Harrison et al., 2020), once assigned in LIN code, these cluster numbers are fixed within an unchanging barcode assigned to each isolate. Therefore, the nomenclature is stable, as the addition of new isolates cannot change previously established LIN codes.

      Cluster stability was considered during the selection of allelic mismatch thresholds. By choosing thresholds based on natural breaks in population structure (Figure 3), applying clustering statistics such as the silhouette score, and by assessing where cluster stability has been maintained within the previous core genome groups nomenclature, we can have confidence that the thresholds which we have selected will form stable clusters. For example, with core genome groups there has been significant group fusion with clusters formed at a threshold of 400 allelic differences, while clustering at a threshold of 300 allelic differences has remained cohesive over time (supported by a high silhouette score) and so was selected as an important threshold in the gonococcal LIN code. LIN codes have now been applied to >27000 isolates in PubMLST, and the nomenclature has remained effective despite the continual addition of new isolates to this collection. The manuscript will be revised to emphasise these points.

      Work is in progress to explore what LIN code thresholds are generally associated with transmission chains. These will likely be the last 7 thresholds (25, 10, 7, 5, 3, 1, 0) as previous work has suggested that isolates linked by transmission within one year are associated with <14 single nucleotide polymorphism differences (De Silva et al., 2016). The results of this analysis will be described in a future article, currently in preparation.

      Harrison, O.B., et al. Neisseria gonorrhoeae Population Genomics: Use of the Gonococcal Core Genome to Improve Surveillance of Antimicrobial Resistance. The Journal of Infectious Diseases 2020.

      De Silva, D., et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. The Lancet Infectious Diseases 2016;16(11):1295-1303.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors try to investigate how the population of microtubules (LSPMB) that originate from sporozoite subpellicular microtubules (SSPM) and are remodelled during liver-stage development of malaria parasites. These bundles shrink over time and help form structures needed for cell division. The authors have used expansion microscopy, live-cell imaging, genetically engineered mutants, and pharmacological perturbation to study parasite development with liver cells.

      A major strength of the manuscript is the live cell imaging and expansion microscopy to study this challenging liver stage of parasite development. It gives important knowledge that PTMs of α-tubulin, such as polyglutamylation and tyrosination/detyrosination, are crucial for microtubule stability. Mutations in α-tubulin reduce the parasite's ability to move and proliferate in the liver cells. The drug oryzalin, which targets microtubules, also blocks parasite development, showing how important dynamic microtubules are at this stage.

      The major problem in the manuscript was the way it flows, as the authors keep shifting from the liver stage to the sporogony stages and then back to the liver stages. It was very confusing at times to know what the real focus of the study is, whether sporozoite development or liver stage development. The flow of the manuscript could be improved. Some of the findings reported here substantiate the previous electron microscopy.

      Overall, the study represents an important contribution towards understanding cytoskeletal remodelling during liver stage infection. The study suggests that tubulin modifications are key for the parasite's survival in the liver and could be targets for new malaria treatments. This is also the stage that has been used for vaccine development, so any knowledge of how parasites proliferate in the liver cells will be beneficial towards intervention approaches.

      We would like to express our sincere gratitude to Reviewer #1 for the positive and encouraging feedback on our manuscript. We are delighted that the reviewer found our experimental design and methodologies appropriate and that our study represents an important contribution to understanding cytoskeletal remodelling during liver stage infection, a critical phase for vaccine development. We are also grateful to the reviewer for highlighting the issue with the manuscript's flow. We acknowledge this limitation and will significantly improve the narrative structure and logical progression in the revised manuscript to ensure clarity and avoid any potential confusion. Thank you again for your thoughtful and constructive comments.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated microtubule distribution and their possible post-translational modifications (PTM) in Plasmodium berghei during development of the liver stage, using either hepatocytes or HeLa cells as models. They used conventional immunofluorescence assays and expansion microscopy with various antibodies recognising tubulin and, in the second part of the work, its candidate PTMs, as well as markers of Plasmodium, in addition to live imaging with a fluorescent marker for tubulin. In the third part of the study, they generated 3 mutants deprived of either the last four residues or the last 11 residues, or where a candidate polyglutamylation site was substituted by an alanine residue.

      Strengths:

      In the first part, microtubules are monitored by a combination of two approaches (IFA and live), revealing nicely the evolution of the sporozoite subpellicular microtubules (SSPM, the sporozoite is the developmental stage present in salivary glands of the mosquitoes and that infects hepatocytes) into a different structure termed liver-stage parasite microtubule bundle (LSPMB). The LSPMB shrinks during the course of parasite development and finally disappears while hemi-spindles emerge over time. Contact points between these two structures are observed frequently in live cells and occasionally in fixed cells, suggesting the intriguing possibility that tubulin might be recycled from the LSPMB to contribute to hemi-spindle formation.

      In the second part, antibodies recognising (1) the final tyrosine found at the C-terminal tail and (2) a stretch of 3 glutamate residues in a side chain are used to monitor these candidate PTMs. Signals are positive at the SSPM, and while it remains positive for polyglutamylation, it becomes negative for the final tyrosine at the LSPM, while a positive signal emerges at hemi-spindles at later stages of development.

      In the last part, the three mutants are fed to mosquitoes, where they show reduced development, the one lacking the alpha-tubulin tail even failing to reach the salivary glands. However, the two other mutants infect HeLa cells normally, whereas sporozoites with the C-terminal tail deletion recovered from the haemolymph did not develop in these cells.

      The first part provides convincing evidence that microtubules are extensively remodelled during the infection of hepatocytes and HeLa cells, in agreement with the spectacular Plasmodium morphogenetic changes accompanying massive and rapid proliferation. The third part brings further confirmation that the C-terminal tail of alpha-tubulin is essential for multiple stages of parasite development, in agreement with previous work (50). Since it is the region where several post-translational modifications take place in other organisms (detyrosination, polyglutamylation, glycylation), it makes sense to propose that the essential function is related to these PTMs also in Plasmodium.

      Weaknesses:

      The significance of tubulin PTM relies on two antibodies whose reactivity to Plasmodium tubulins is unclear (see below). The interpretation of the literature on detyrosination and polyglutamylation is confusing in several places, meaning that the statements about the possible role of these PTMs need to be carefully revisited.

      The authors use the term "tyrosination" but the alpha1-tubulin studied here possesses the final tyrosine when it is synthesised, so it is "tyrosinated" by default. It could potentially be removed by a tyrosine carboxypeptidase of the vasoinhibin family (VASH) as reported in other species. After removal, this tyrosine can be added again by a tubulin-tyrosine ligase (TTL) enzyme. It is therefore more appropriate to talk about detyrosination-retyrosination rather than tyrosination (this confusion is unfortunately common in the literature, see Janke & Magiera, 2020).

      The difficulty here is that there is so far no evidence that detyrosination takes place in Plasmodium. Neither VASH nor TTL could be identified in the Plasmodium genome (ref 31, something we can confirm with our unsuccessful BLAST analyses), and mass spectrometry studies of purified tubulin, albeit from blood stages, did not find evidence for detyrosination (reference 43). Western blots using an antibody against detyrosinated tubulin did not produce a positive signal, neither on purified tubulin, nor on whole parasites (43). Of course, the situation could be different in liver stages, but the question of the detyrosinating enzyme is still there. The existence of a unique Plasmodium system for detyrosination cannot be formally ruled out but given the high degree of conservation of these PTMs and their associated enzymes, it sounds difficult to imagine.

      The fact that the anti-tyrosinated antibody still produced a signal in the cell line where the final tyrosine is deleted raises issues about its specificity. A cross-reactivity with beta-tubulin is proposed, but the Plasmodium beta-tubulin does not carry a final tyrosine, further raising concerns about antibody specificity.

      The interpretation of these results should therefore be considered carefully. There also seems to be some confusion in the function of detyrosination cited from the literature. It is said in line 229 that "tyrosination has been associated with stable microtubules" (33, 34, 50, 55). References 33 and 34 actually show that tyrosinated microtubules turn over faster in neurons or in epithelial cells, respectively, while references 50 and 55 do not study de/retyrosination. The general consensus is that tyrosinated microtubules are more dynamic (see reference 24).

      The situation is a bit different for polyglutamylation since several candidate poly- or mono-glutamylases have been identified in the Plasmodium genome, and at least mono-glutamylation of beta-tubulin has been formally proven, still in bloodstream stages (ref 43). The authors propose that the residue E445 is the polyglutamylation site. To our knowledge, this has not been demonstrated for Plasmodium. This residue is indeed the favourite one in several organisms such as humans and trypanosomes (Eddé et al., Science 1990; Schneider et al., JCS, 1997), and it is tempting to propose it would be the same here. However, TTLLs bind the tubulin tails from their C-terminal end like a glove on a finger (Garnham et al., Cell, 2015), and the presence of two extra residues in Plasmodium tubulins would mean that the reactive glutamate might be in position E447 rather than E445. This is worth discussing.

      On the positive side, it is encouraging to see that signals for both anti-tyrosinated tail and poly-glutamylated side chain are going down in the various mutants, but this would need validation with a comparison for alpha-tubulin signal.

      Line 316: polyglutamylation "is commonly associated with dynamic microtubule behavior (78-80)". Actually, references 78 and 79 show the impact of this PTM on interaction with spastin, and reference 80 discusses polyglutamylation as a marker of stable microtubules in the context of cilia and flagella. The consensus is that polyglutamylated microtubules tend to be more stable (ref24).

      Conclusion:

      The first and the third parts of this manuscript - evolution of microtubules and importance of the C-terminal tails for Plasmodium development - are convincing and well supported by data. However, the presence and role of tubulin PTM should be carefully reconsidered.

      Plasmodium tubulins are more closely related to plant tubulins and are sensitive to inhibitors that do not affect mammalian microtubules. They therefore represent promising drug targets as several well-characterised compounds used as herbicides are available. The work produced here further defines the evolution of the microtubule network in sporozoites and liver stages, which are the initial and essential first steps of the infection. Moreover, Plasmodium has multiple specificities that make it a fascinating organism to study both for cell biology and evolution. The data reported here are elegant and will attract the attention of the community working on parasites but also on the cytoskeleton at large. It will be interesting to have the feedback of other people working on tubulin PTMs to figure out the significance of this part of the work.

      We thank Reviewer #2 for the thoughtful and detailed evaluation of our manuscript. We are pleased that the reviewer found our study elegant and believe it will attract the attention of the broader scientific community, both those working on parasites and those focused on cytoskeleton biology. We also acknowledge the concerns raised regarding the specificity of the antibodies used to detect tubulin post-translational modifications (PTMs), as well as the interpretation of their signals and the current lack of identified detyrosination enzymes in the Plasmodium genome. We agree that these are important limitations, and we will address them thoroughly in the revised manuscript. This includes clarifying our interpretation of tyrosination versus detyrosination, adjusting our claims regarding polyglutamylation sites, and carefully revisiting the literature cited to ensure accurate contextualization of PTM function in microtubule stability.

      We are grateful for the reviewer’s close reading and critical feedback, which will help us substantially improve the clarity, precision, and strength of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Atchou et al. investigates the role of the microtubule cytoskeleton in sporozoites of Plasmodium berghei, including possible functions of microtubule post-translational modifications (tyrosination and polyglutamylation) in the development of sporozoites in the liver. They also assessed the development of sporozoites in the mosquito. Using cell culture models and in vivo infections with parasites that contain tubulin mutants deficient in certain PTMs, they show that may aspects of the life cycle progression are impaired. The main conclusion is that microtubule PTMs play a major role in the differentiation processes of the parasites.

      However, there are a number of major and minor points of criticism that relate to the interpretation of some of the data.

      We thank Reviewer #3 for the overall positive assessment of our study and for recognizing its contribution to advancing our understanding of Plasmodium biology and malaria pathogenesis. We appreciate the reviewer’s constructive feedback, particularly regarding the interpretation of some of our data. These comments have been very helpful in guiding our revisions, and we have worked to improve both the clarity of our presentation and the precision of our interpretations in the revised manuscript.

      Below, we respond in detail to each of the reviewer’s points.

      Comments:<br /> (1) The first paragraph of "Results" almost suggests that the presence of a subpellicular MT-array in sporozoites is a new discovery. This is not the case, see e.g. the recent publication by Ferreira et al. (Nature Communications, 2023).

      We thank the reviewer for pointing this out and fully agree that the subpellicular microtubule (SPM) array in sporozoites is well established, as documented in earlier work (e.g., Cyrklaff et al., 2007) and more recently by Ferreira et al. (Nat. Commun., 2023). Our intention was not to suggest that the existence of the SSPM is a novel finding. Rather, our study builds on this existing knowledge by demonstrating that these sporozoite-derived microtubules are not disassembled upon hepatocyte entry but are repurposed into a newly described structure, the liver stage parasite microtubule bundle (LSPMB). This reorganization, its persistence into liver stage development, and its dynamic role in microtubule remodeling and nuclear division are, to our knowledge, novel observations. We will revise the manuscript to make this distinction clearer in the introduction and the results section.

      (2) Why were HeLa cells and not hepatocytes (as in Figure 3) used for measuring infection rates of the mutants in Figure 5H and 5L? As I understand, HeLa cells are not natural host cells for invading sporozoites. HeLa cells are epithelial cells derived from a cervical tumour. I am not an expert in Plasmodium biology, but is a HeLa infection an accepted surrogate model for liver stage development?

      We appreciate the opportunity to clarify our experimental model. While HeLa cells are not the natural host cells, they are a well-established and validated in vitro model for studying Plasmodium berghei liver stage development in our lab and others. In this system, the parasite completes its full development and generates infectious merozoites. Numerous studies have successfully used HeLa cells as a liver stage infection model, with key findings subsequently validated in primary hepatocytes or in vivo, confirming its utility as a representative model. We employed this cell line primarily to reduce animal usage in accordance with the 3Rs principles (Replacement, Reduction, Refinement). Importantly, to ensure the biological relevance of our discoveries in HeLa cells, we validated our key findings in primary mouse hepatocytes, as shown in Figure 3. Furthermore, we confirmed the in vivo infectivity of mutant parasite lines that produced typical salivary gland sporozoites through an in vivo infection assay, presented in Figure S4C.

      (3) The tubulin staining in Figures 1A and 1B is confusing and doesn't seem to make sense. Whereas in 1A the antibody nicely stains host and parasite tubulin, in 1B, only parasite tubulin is visible. If the same antibody and the same host cells have been used, HeLa cytoplasmic microtubules should be visible in 1B. In fact, they should be the predominant antigen. The same applies to Figure 2, where host microtubules are also not visible.

      We thank the reviewer for this careful observation regarding the α-tubulin staining in Figures 1A and 1B. The same host cell type (HeLa) and α-tubulin antibody were indeed used in both experiments. Figure 1A shows results from conventional immunofluorescence assays, where both host and parasite microtubules are clearly stained. In contrast, Figure 1B shows the outcome of ultrastructure expansion microscopy (U-ExM), where parasite microtubules appear prominently, while host microtubules are less visible.

      This effect appears to be a technical outcome of the U-ExM protocol, which can differentially preserve or reveal microtubule epitopes. We consistently observed stronger parasite signal across various cell types, including primary hepatocytes (Figure 3A,B). The lack of visible host microtubules in some U-ExM images does not reflect their absence, but rather reduced signal intensity relative to the parasite structures. This is not observed with all antibodies, e.g., host microtubules stain strongly with anti-tyrosinated α-tubulin (Figure 3B), likely reflecting their high tyrosination state.

      To overcome this limitation, we employed PS-ExM and combined PS-ExM/U-ExM approaches (as described in reference 56), which allowed simultaneous high-resolution visualization of both host and parasite microtubule networks. These combined methods are now being used in follow-up studies to investigate host–parasite microtubule interactions in more detail.

      We will clarify this point in the revised manuscript to avoid confusion.

      (4) In Figures 2A and B, the host nuclei appear to have very different sizes in the DMSO controls and in the drug-treated cells. For example, in the 20 µM (-) image (bottom right), the nuclei are much larger than in the DMSO (-) control (top left). If this is the case, expansion microscopy hasn't worked reproducibly, and therefore, quantification of fluorescence is problematic. The scalebar is the same for all panels.

      The expansion microscopy methods used in this study have been rigorously validated for both reproducibility and isotropicity. However, as the reviewer rightly notes, host cell nuclei can vary in size due to several factors, including cell cycle stage, infection status, and the extent of parasite development, all of which can influence host nuclei morphology and size.

      Importantly, the quantifications relevant to our conclusions were focused specifically on parasite structures. We did not rely on host nuclear size or host fluorescence intensity as a quantitative readout in this context. While we acknowledge the observed variability in host nuclear dimensions, it does not compromise the accuracy or reproducibility of the parasite specific measurements central to our study.

      We will clarify this point in the revised figure legend and manuscript.

      (5) I don't quite follow the argument that spindles and the LSPMB are dynamic structures (e.g., lines 145, 174). That is a trivial statement for the spindle, as it is always dynamic, but beyond that, it has only been shown that the structure is sensitive to oryzalin. That says little about any "natural" dynamic behaviour. Any microtubule structure can be destroyed by a particular physical or chemical treatment, but that doesn't mean all structures are dynamic. It also depends on the definition of "dynamic" in a particular context, for example, the time scale of dynamic behaviour (changes within seconds, minutes, or hours).

      We agree that sensitivity to chemical depolymerization alone does not necessarily indicate dynamic behavior, particularly in the absence of data on turnover kinetics or temporal changes.

      Our interpretation was based on two observations: first, that the LSPMB, which derives from the highly stable sporozoite subpellicular microtubules (known to be drug-resistant), becomes susceptible to depolymerization during the liver stage; and second, that the LSPMB gradually shrinks over time during parasite development. These features suggested a transition toward a more dynamic state compared to its origin. However, we fully agree that “dynamic” is a context-dependent term and that direct evidence such as turnover rates or structural changes on short time scales, is required to rigorously define microtubule dynamics.

      We will revise the manuscript to clarify our use of this term and explicitly acknowledge the need for further studies to characterize the timescale and mechanisms underlying LSPMB remodeling.

      (6) I am not sure what part in the story EB1 plays. The data are only shown in the Supplements and don't seem to be of particular relevance. EB1 is a ubiquitous protein associated with microtubule plus ends. The statement (line 192) that it "may play a broader role..." is unsubstantiated and cannot be based merely on the observation that it is expressed in a particular life cycle stage.

      We agree that EB1 is a ubiquitous microtubule plus-end binding protein and that its presence alone does not imply a novel function. Previous studies (e.g., Maurer et al., 2023; Yang et al., 2023; Zeeshan et al., 2023) have focused on its role during Plasmodium sexual stages, while its expression during liver and mosquito stages has not been previously documented.

      Our data extend this knowledge by showing that EB1 is also expressed during liver stage development, particularly during the highly mitotic schizont phase. While we agree that this observation alone does not prove functional involvement, it raises the possibility of a broader role for EB1 in regulating microtubule dynamics beyond sexual stages. To avoid overinterpretation, we have presented these findings in the supplementary material and will revise the manuscript to tone down speculative statements and clearly frame this as a preliminary observation that warrants further investigation.

      (7) Line 196 onwards: The antibody IN105 is better known in the field as polyE. Maybe that should be added in Materials and Methods. Also, the antibody T9028 against tyrosinated tubulin is poorly validated in the literature and rarely used. Usually, researchers in this field use the monoclonal antibody YL1/2. I am not sure why this unusual antibody was chosen in this study. In fact, has its specificity against tyrosinated α-tubulin from Plasmodium berghei ever been shown? The original antigen was human and had the sequence EGEEY. The Plasmodium sequence is YEADY and hence very different. It is stated that the LSPMB is both polyglutamylated and tyrosinated. This is unusual because polyglutamylated microtubules are usually indicative of stable microtubules, whereas tyrosinated microtubules are found on freshly polymerised and dynamic microtubules. However, a co-localisation within the same cell has not been attempted. This is, however, possible since polyE is a rabbit antibody and T9028 is a mouse antibody. I suspect that differences or gradients along the LSPMB would have been noticed. Also, in lines 207/208, it is said that tyrosination disappears after hepatocyte invasion, which is shown in Figure 3. However, in Figure 3A, quite a lot of positive signals for tyrosination are visible in the 54 and 56 hpi panels.

      First, we acknowledge that the IN105 antibody is more widely known as "polyE" in the field. We will update the Materials and Methods section accordingly to reflect this nomenclature.

      Regarding the use of the T9028 antibody against tyrosinated α-tubulin: we agree that this monoclonal antibody is less commonly used than YL1/2, and we appreciate the reviewer drawing attention to this. The original antigen for T9028 is based on the mammalian C-terminal sequence EGEEY, which differs from the Plasmodium α1-tubulin sequence (YEADY). Like many in the field, we face the challenge that most available antibodies are raised against mammalian epitopes, and specificity in Plasmodium can vary. Nonetheless, the literature (e.g., Hirst et al., 2022; Fennell et al., 2008) has demonstrated that tyrosination occurs in Plasmodium α1-tubulin, using anti-tyrosination antibodies including YL1/2.

      Following the reviewer’s excellent suggestion, we are currently repeating the key experiments using the YL1/2 antibody to compare staining patterns directly with those obtained using T9028. We will include these results in the revised manuscript.

      Concerning the potential co-localization of polyglutamylation and tyrosination on the LSPMB: we agree that this is an interesting and testable hypothesis. In the current manuscript, Figures 3A and 3B were generated from independent experiments, and thus co-localization was not assessed. However, as the reviewer correctly notes, polyE and T9028 antibodies are raised in rabbit and mouse, respectively, making co-staining feasible. We will follow up on this experimentally and, if feasible within our revision timeline, include data in the revised version or highlight this as a future direction.

      Finally, with regard to Figure 3 and the observation that tyrosination appears to persist at 54 and 56 hpi (Figure 3B): the reviewer is correct that tyrosination signal is still detectable at these time points. Our statement that tyrosination “disappears after hepatocyte invasion” was intended to refer to an overall decrease in signal intensity during early liver stage development, with a reappearance at later stages (e.g., cytomere formation). We will rephrase this section for greater clarity and ensure that figure annotations and legends unambiguously reflect the dynamics observed.

      (8) In line 229, it is stated that tyrosination "has previously been associated with stable microtubule in motility". This statement is not correct. In fact, none of the cited references that apparently support this statement show that this is the case. On the contrary, stable microtubules, such as flagellar axonemes, are almost completely detyrosinated. Therefore, tyrosination is a marker for dynamic microtubules, whereas detyrosinated microtubules are indicative of stable microtubules. This is an established fact, and it is odd that the authors claim the opposite.

      We fully agree that in canonical eukaryotic systems, tyrosinated microtubules are generally markers of dynamic microtubule populations, whereas detyrosinated microtubules are typically associated with stability particularly in structures such as flagellar axonemes.

      Our original statement will be corrected. In our study, we observed that tyrosinated microtubules are prevalent in invasive stages (sporozoites and merozoites), while detyrosinated forms become more prominent during intracellular liver stage development. This pattern is consistent with the established link between tyrosination and dynamic microtubules.

      What is particularly intriguing in Plasmodium is the apparent cycling of tyrosination despite the absence of known tubulin tyrosine ligase (TTL) homologs in the genome. This suggests either a highly divergent enzyme or the involvement of host cell factors, a hypothesis supported by the reappearance of tyrosinated microtubules during liver stage schizogony (Figure 3B).

      We will revise the relevant text and the Discussion section to reflect these mechanistic considerations more accurately and to avoid misrepresenting established principles of microtubule biology.

      (9) Line 236 onwards: Concerning the generation of tubulin mutants, I think it is necessary to demonstrate successful replacement of the wild-type allele by the mutant allele. I am sure the authors have done this by amplification and subsequent sequencing of the genomic locus using PCR primers outside the plasmid sequences. I suggest including this information, e.g., by displaying the chromatograph trace in a supplementary figure. Or are the sequences displayed in Figure S3B already derived from sequenced genomic DNA? This is not described in the Legend or in Materials and Methods. The left PCR products obtained for Figure S3 B would be a suitable template for sequencing.

      Indeed, these data are presented in Figure 4B and the corresponding sequence data are shown in Figure S3B. We appreciate the reviewer’s suggestion, which will help improve the transparency and reproducibility of our methodology.

      (10) It is also important to be aware of the fact that glutamylation also occurs on β-tubulin. This signal will also be detected by polyE (IN105). Therefore, it is surprising that IN105 immunofluorescence is negative on the C-term Δ cells (Figure S3 D). Is there anything known about confirmed polyglutamylation sites on both α- and β-tubulins in Plasmodium, e.g., by MS? In Toxoplasma, both α- and β-tubulin have been shown to be polyglutamylated.

      Indeed, polyglutamylation is known to occur not only on α-tubulin but also on β-tubulin in many organisms, including Toxoplasma gondii, and the polyE (IN105) antibody is expected to detect polyglutamylation on both tubulin isoforms.

      The parasites shown in Figure S3D correspond to mutant lines originally generated by Spreng et al. (2019): the IntronΔ mutant (with deletion of introns in the Plasmodium α1-tubulin gene) and the C-termΔ mutant (with deletion of the final three C-terminal residues: ADY). As the reviewer correctly notes, this particular C-terminal deletion does not include the predicted polyglutamylation site (E445 or E447, depending on alignment), and thus should not abolish all polyglutamylation. However, in our experiments, the IN105 signal is substantially reduced in this mutant. This may suggest that structural alterations in the tubulin tail affect accessibility of the polyglutamylation epitope or influence the modification itself though we cannot exclude other possibilities, including changes in antibody recognition.

      To date, polyglutamylation sites in Plasmodium tubulins have not been definitively confirmed by mass spectrometry. However, a recent MS-based study (reference 43) detected monoglutamylation on β-tubulin in blood stage parasites. Direct MS evidence for polyglutamylation of either α- or β-tubulin in Plasmodium liver stages is still lacking. We will clarify these points in the revised manuscript to avoid potential confusion and to highlight the need for future biochemical validation of PTM sites.

      (11) Figure S3 is very confusing. In the legend, certain intron deletions are mentioned. How does this relate to posttranslational tubulin modifications? The corresponding section in Results (lines 288-292) is also not very helpful in understanding this.

      The parasite lines shown in Figure S3D were originally generated by Spreng et al. (2019) and are not directly part of the main set of PTM-targeted mutants described in our study. Specifically, the IntronΔ line carries deletions in introns of the Plasmodium α1-tubulin gene, while the C-termΔ line lacks the final three C-terminal residues (ADY). These lines were included for comparative purposes to explore whether structural changes in α-tubulin could impact polyglutamylation signal, as detected by the polyE (IN105) antibody.

      We acknowledge that the figure legend and corresponding text (lines 288–292) did not adequately explain the rationale for including these control lines. We will revise both the legend and Results section to more clearly describe the origin, purpose, and relevance of these mutants to the overall study.

      (12) Figure 4E doesn't look like brightfield microscopy but like some sort of fluorescent imaging. In Figure 4C, were the control (NoΔ) cells with an integrated cassette, but no mutations, or non-transgenic cells?

      The reviewer is absolutely correct: Figure 4E shows a fluorescent image acquired using widefield microscopy and not a brightfield image. We will revise the figure legend accordingly to avoid confusion. The “BF” (brightfield) label applies only to the left panel in Figure 4C, which depicts oocysts imaged using transmitted light.

      Regarding the controls labeled "NoΔ" in Figure 4C, we confirm that these parasites contain the integrated selection cassette but do not harbor any mutations in the target gene. They serve as proper integration controls, allowing us to distinguish the effects of the point mutations or deletions introduced in the experimental lines.

      (13) It is difficult to understand why the TyΔ and the CtΔ mutants still show quite a strong signal using the anti-tyrosination antibody. If the mutants have replaced all wild-type alleles, the signal should be completely absent, unless the antibody (see my comment above concerning T9028) cross-reacts with detyrosinated microtubules. Therefore, the quantitation in Figures 5F and 5G is actually indicative of something that shouldn't be like that. The quantitation of 5F is at odds with the microscopy image in 5D. If this image is representative, the anti-Ty staining in TyΔ is as strong as in the control NoΔ.

      We agree that the persistence of anti-tyrosination signal in the TyΔ and CtΔ mutant lines is unexpected, given that all wild-type alleles were replaced. This discrepancy has led us to further investigate the specificity of the T9028 antibody, as raised in the reviewer’s earlier comment. To address this concern, we are currently repeating the key experiments using the well-established YL1/2 monoclonal antibody, which is widely accepted for detecting tyrosinated α-tubulin in other systems.

      We also acknowledge that Figure 5F shows residual tyrosination signal, and the reviewer is correct that this should not occur if the modified residues are the exclusive PTM sites. One possible explanation is that adjacent residues or even alternative tubulin isoforms may serve as substrates. While α1-tubulin is the dominant isoform in Plasmodium, low-level expression of α2-tubulin has been detected in liver stages based on transcriptomic data, and it may contribute to the observed signal.

      Regarding the apparent discrepancy between the quantification in Figure 5F and the representative image in Figure 5D, we will revise the figure legend to clarify that image selection aimed to show detectable signal, not necessarily the average phenotype. We will also reassess and, if needed, repeat the quantification with improved image sets to ensure accuracy and consistency.

      We will revise the manuscript to reflect these points and include a more nuanced interpretation of the residual staining in the mutant lines.

      (14) The statement that the failure of CtΔ mutants to generate viable sporozoites is due to the lack of microtubule PTMs (lines 295-296) is speculative. The lack of the entire C-terminal tail could have a number of consequences, such as impaired microtubule assembly or failure to recruit and bind associated proteins. This is not necessarily linked to PTMs. Also, it has been shown in yeast that for microtubules to form properly and exquisite regulation (proteostasis) of the ratio between α- and β-tubulin is essential (Wethekam and Moore, 2023). I am not sure, but according to Materials and Methods (line 423), the gene cassettes for replacing the wild-type tubulin gene with the mutant versions contain a selectable marker gene for pyrimethamine selection. Are there qPCR data that show that expression levels of mutant α-tubulin are more or less the same as the wild-type levels?

      We agree that attributing the developmental failure of the CtΔ mutants solely to the absence of microtubule post-translational modifications (PTMs) is speculative. As the reviewer rightly points out, deletion of the entire C-terminal tail may have multiple effects, including impaired microtubule assembly, altered α/β-tubulin stoichiometry, or disruption of interactions with essential microtubule-associated proteins (MAPs). These consequences may arise independently of PTMs.

      That said, we note that PTMs particularly polyglutamylation, can modulate MAP binding by altering the surface charge of microtubules (Genova et al., 2023; Mitchell et al., 2010). Therefore, while PTM loss may be a contributing factor, we acknowledge that the phenotype likely results from a combination of mechanisms. We will revise the relevant section of the manuscript to present a more cautious and balanced interpretation.

      Regarding the reviewer’s question on expression levels: although the replacement constructs include a pyrimethamine resistance cassette, we have not yet quantified α-tubulin transcript levels by qPCR. In the interim, the study by Spreng et al. (2019) (reference 50) on a related α1-tubulin nutations provides valuable insight. They observed no difference in mRNA levels in day 12 oocysts, yet reported fainter microtubule staining and shorter sporozoites, suggesting a post-transcriptional mechanism affecting protein expression or function in later stages. Furthermore, the phenotypic spectrum across their mutant panel (Suppl. Fig. 3 D and E) implies that robust α-tubulin regulation is highly sensitive to specific sequences.

      We acknowledge this as a current limitation in our study and will address it in the revised manuscript, noting that direct measurement of transcript levels is a key area for future investigation.

      (15) In the Discussion, my impression is that two recent studies, the superb Expansion Microscopy study by Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), are not sufficiently recognised (although they are cited elsewhere in the manuscript). The latter study includes a detailed description of the microtubule cytoskeleton in sporozoites. However, the present study clearly expands the knowledge about the structure of the cytoskeleton in liver stage parasites and is one of the few studies addressing the distribution and function of microtubule post-translational modifications in Plasmodium.

      Indeed, our work builds upon the established knowledge from Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), as rightly mentioned by the reviewer. We agree that these foundational studies, combined with our findings, will significantly expand the understanding of Plasmodium biology and cytoskeleton dynamics across its life cycle and will open the door for further investigations. We are grateful for this suggestion and will ensure these key studies are appropriately acknowledged in the revised manuscript.

      (16) I somewhat disagree with the statement of a co-occurrence of polyglutamylated and tyrosinated microtubules. I think the resolution is too low to reach that conclusion. As this is a bold claim, and would be contrary to what is known from other organisms, it would require a more rigorous validation. Given the apparent problems with the anti-Ty antibody (signal in the TyΔ mutant), one should be very cautious with this claim.

      This is a very important point to clarify. As mentioned previously, the initial experiments for these modifications were performed independently. It is established that sporozoite subpellicular microtubules exhibit both tyrosination and polyglutamylation. We will revise the manuscript to temper this statement and clearly indicate that the co-occurrence of these PTMs remains a hypothesis that requires more rigorous validation. As suggested, we are now conducting additional co-staining experiments using the better validated YL1/2 antibody to re-express and directly compare the distribution of both PTMs within the same cell. These follow-up experiments will help clarify whether both modifications occur simultaneously on the same microtubule structures in Plasmodium liver stages.

      (17) In the Discussion (lines 311 and 377), it is again claimed that tyrosinated microtubules are "a well-known marker of stable microtubules". This statement is completely incorrect, and I am surprised by this serious mistake. A few lines later, the authors say that polyglutamylated is "commonly associated with dynamic microtubule behaviour". Again, this is completely incorrect and is the opposite of what is firmly established in the literature. Polyglutamylation and detyrosination are markers of stable microtubules.

      Indeed, in canonical eukaryotic systems, tyrosinated microtubules are generally considered markers of dynamic microtubule populations, whereas detyrosinated and polyglutamylated microtubules are more commonly associated with stability.

      We acknowledge this mistake and will revise the Discussion to correct these statements accordingly. In the context of Plasmodium, our observations suggest an unusual regulation of microtubule dynamics, which may reflect parasite-specific adaptations. For example, we observed tyrosinated α-tubulin in the stable subpellicular microtubules of sporozoites structures typically known for their exceptional stability. This atypical association implies either non-canonical roles for tyrosination or parasite-specific mechanisms for modulating microtubule properties. Additionally, the presence of both PTMs at different stages of development and on different microtubule populations suggests tightly regulated spatial and temporal modulation of microtubule function.

      We will carefully revise the relevant sections of the manuscript to remove incorrect generalizations and ensure accurate representation of the current consensus in the field, while emphasizing the possibility of Plasmodium-specific adaptations that merit further study.

      (18) In line 339, the authors interpret the residual antibody staining after the introduction of the mutant tubulin as a compensatory mechanism. There is no evidence for this. More likely explanations are firstly the quality of the anti-Ty-antibody used (see comment above), and the fact that also β-tubulin carries C-terminal polyglutamylation sites, which haven't been investigated in this study. PTMs on β-tubulin are not compensatory, but normal PTMs, at least in all other organisms where microtubule PTMs have been investigated.

      As mentioned above, we are currently repeating the key experiments with the [YL1/2] antibody, as suggested. Furthermore, we fully agree with the reviewer's point regarding polyglutamylation on β-tubulin. The C-terminal tail of β-tubulin does indeed contain polyglutamylation sites. As we noted in the manuscript (Lines 340-352), this aspect has not been investigated in the present study, and we acknowledge it as a valuable direction for future research. We will revise the text accordingly to avoid overinterpretation and to more accurately reflect the limitations of our current data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors define the principles that, based on first principles, should be guiding the optimisation of trascription factors with intrinsically disordered regions (IDR). The first part of the study defines the following principles to optimize the binding affinities to the genome in the receiving region that is called the ”antenna”: (i) reduce the target to IDR-binding distance on the genome, (ii) optimise the distance betwee the DNA binding domain and the binding sites on the IDR to be as close as possible to the distance between their binding sites on the genome; (iii) keep the same number of binding sites and their targets and modulate this number with binding strength, reducing them with increased strenght; (iv) modulate the binding strenght to be above a threshold that depends on the proportion of IDR binding sites in the antenna. The second part defines the scaling of the seach time in function of key parameters such as the volume of the nucleus, and the size of the antenna, derived as a combination of 3D search of the antenna and 1D ”octopusing” on the antenna. The third part focuses on validation, where the current results are compared to binding probabilith data from a single experiment, and new experiment are proposed to further validate the model as well as testing designed transcription factors.

      Strengths:

      The strength of this work is that it provides simple, interpretable and testable theoretical conclusions. This will allow the derived design principles to be understood, evaluated and improved in the future. The theoretical derivations are rigorous. The authors provides a comparison to experiments, and also propose new experiments to be performed in the future, this is a great value in the paper since it will set the stage and inspire new experimental techniques. Further, the field needs inspiration and motivations to develop these techniques, since they are required to benchmark the transcription factors designed with the methods presented in this paper, as well as to develop novel data based or in vivo methods that would greatly benefit the field. As such, this paper is a fundamental contribution to the field.

      Weaknesses:

      The model assumption that the interaction between the transcription factor and the DNA outside of the antenna region is negligible is probably too strong for many/most transcription factors, particularly in organisms with a longer genome than yeasts. The model presents many first principles to drive the design of transcription factor, but arguably, other principles and mechanisms might also play a role by being beneficial to the search and binding process. Specifically: (i) a role of the IDR in complex formation and cooperativity between multiple trascription factors, (ii) ability of the IDR to do parallel searching based on multiple DNA binding sites spaced by disordered regions, (iii) affinity of the IDR to specific compartmentalisations in the nucleus reducing the search time, etc. The paper would be improved by a discussion over alternative mechanisms.

      We thank the reviewer for highlighting that our work delivers simple, interpretable and rigorously derived conclusions, backed by experimental comparison and concrete proposals for future studies.

      Regarding interactions outside the antenna region, Supplementary S10 shows that the non-specific IDR–DNA interactions (on the order of 1 kBT) only slightly alter the 3D diffusion coefficient and thus do not affect our conclusions regarding the optimal search process.

      We have also added sentences in the discussion section regarding the alternative mechanism.

      Reviewer #2 (Public review):

      Summary:

      This is an interesting theoretical exploration of how a flexible protein domain, which has multiple DNAbinding sites along it, affects the stability of the protein-DNA complex. It proposes a mechanism (”octopusing”) for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and stability of the bound state.

      Strengths:

      Stability of the protein-DNA bound state and the ability of the protein to perform 1d diffusion along the DNA are two properties of a transcription factor that are usually seen as being in opposition of each other. The octopusing mechanism is an elegant resolution of the puzzle of how both could be accommodated. This mechanism has interesting biological implications for the functional role of intrinsically disordered domains in transcription factor (TF) proteins. They show theoretically how these domains, if flexible and able to make multiple weak contacts with the DNA, can enhance the ability of the TF to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model. Overall, this is an interesting and well executed theoretical paper that proposes an interesting idea about the functional role for IDR domains in TFs.

      Weaknesses:

      IDR domains are assumed flexible which I believe is not always the case. Also, I’m not sure how ubiquitous are the assumed binding sites on the DNA for multiple subdomains along the IDR. These assumptions though seem like interesting points of departure for further experiments.

      We thank the reviewer for their careful and insightful evaluation of our work. In particular, we appreciate your emphasis on the inherent trade-off between binding stability and one-dimensional diffusion, and your recognition of how the octopusing mechanism elegantly reconciles these conflicting requirements.

      To address the flexibility of TFs with IDRs, we incorporated the spring’s rest length—effectively introducing tunable rigidity—in Supplementary Section S1, and we show that our design principles for binding probability remain robust. Indeed, this is a highly interesting point; a comprehensive study will require more detailed modeling alongside experimental validation.

      We acknowledge that the current evidence for IDR-directed DNA binding is primarily derived from a limited number of well-studied cases, particularly Msn2 in yeast, and the ubiquity of this mechanism across diverse transcription factors remains to be established.

      Reviewer #1 (Recommendations for the authors):

      The paper jumps to fast to the results, an larger introduction might improve the paper, the current introduction jumps too fast to results. Further, line 50, I don’t think that the figure is properly referenced. The formula 2 is confusing since what is the target volume V1 is not explained in the context of the formula, please expand the explanations.

      We appreciate the reviewer’s valuable recommendations. We have expanded the Introduction, clarified V<sub>1</sub>, and updated the line 50.

      Reviewer #2 (Recommendations for the authors):

      I have some mostly minor suggestions to the authors for improving the manuscript:

      In the abstract and introduction on at least two occasions the authors talk about IDRs as though they’re necessarily flexible. My understanding is that, while this is a very reasonable assumption, I don’t think this is something we know with any certainty for most IDRs. If the authors agree with my assessment I think they should reflect this uncertainty in the writing.

      Thank you for the recommendations. We revised the wording to reflect the uncertainty, changing it to: “... commonly assumed to behave as a long, flexible...” and “...can be assumed as flexible....”.

      It took me a bit of time to figure out what’s going on in Figure 1b. To help the reader I would suggest labeling the DBD targets (yellow square) and the IDR targets (gray squares) as such. The figure also left me guessing whether the DBD domain can bind to the IDR targets non-specifically? (I presume not.) This also brought a slightly bigger question into focus for me, wouldn’t the presence of the IDR binding ”sites” (since these ”sites” are on the protein I think the term ”domains” instead of ”sites” ) mean that this would increase the time the protein is bound non-specifically somewhere far from the target thereby increasing the search time. Or is the ability of the protein to bind specifically to DNA away from the DBD target ignored?

      We have labeled the DBD targets and IDR targets in the figure. ‘Domains’ usually refers to structured parts; we keep using ‘sites’ and clarify that they correspond to short linear motifs.

      The reviewer is correct. Our model omits any non-specific binding between the DBD and IDR-binding targets, as well as between the TF and other DNA regions. If such interactions were to substantially lengthen the search time, they would effectively revert our mechanism to the classical bacterial facilitateddiffusion model, which is generally considered inappropriate for IDR-mediated TF search in eukaryotic cells. However, Supplementary Figure S10 demonstrates that non-specific IDR–DNA interactions induce only marginal changes in the effective three-dimensional diffusion coefficient within complex chromatin environments, and therefore do not alter our conclusions regarding the optimal search process.

      In Equation 2 and the text that follows I was left wondering what is the target volume V1. Also, I think it would be helpful to the reader to give them a sense of scale for the dimension full quantities appearing in Equation 2. This is done later when comparing the theory to experimental data, but I think it would be helpful to give a sense of size earlier in the manuscript.

      V<sub>1</sub> denotes the volume of the IDR–binding target region, which is on the order of bp<sup>3</sup>. f(d,l<sub>0</sub>) has units of inverse volume. We have included the units and specified the order of magnitude of V<sub>1</sub> after Equation 2.

      The binding energy EB is discussed a number of times but it wasn’t clear to me that this quantity referred to the energy per IDR site on the DNA or the total energy when the IDR is bound to DNA. In Figure 1 it would seem that the model allows only one IDR domain bound at a given time but I think the model allows for multiple IDR domains to be bound to the IDR target sites simultaneously. Right? Maybe make this clear in the Figure and the text.

      E<sub>B</sub> denotes the binding energy per binding site, where each site corresponds to a short linear motif. Yes, we allow for multiple IDR domains to be bound to the IDR target sites simultaneously. We have clarified the definition of E<sub>B</sub> and adjusted the figure slightly to avoid any misunderstanding.

      After Eq 4 the discussion suggests that for ϕ << 1 the threshold energy is much greater than kBT, but that’s hard to imagine given that the logarithmic dependence of the latter on the former. Also in Figure 2d it seems that the threshold energy is about 8 kBT. Clearly this is not a big deal, just thought the authors might want to revise the language.

      Thank you. We now clarify the sentence using the representative values of ϕ and E<sub>th</sub> after Equation 4.

      Right after Figure 2 there is a discussion of the different parameters that the authors vary. I suggest having a figure that illustrates these parameters (possibly in Figure 1b) to make it easier to follow the discussion.

      We have added explanations of the relevant parameters in Figure 1 for clarity.

      When discussing the dynamics of search the result stated is that the search time is minimum for a specific value of R. I think it would be useful to translate this into a TF concentration. Also, if R represents the radius of the cells nucleus 1/6 um is almost an order of magnitude smaller than the size of a typical nucleus. Is this a worry? Either way some clarification of this number would be helpful.

      Thank you for the suggestion. As noted later in this section, we have translated R into an equivalent TF concentration, and we clarify that we assume the scaling of the minimum search time remains unchanged when extrapolated to the size of a typical nucleus.

      There is a comment regarding the role of the DNA persistence length and how it was not accounted for. It would be helpful if the authors could add a sentence or two explains how a folded DNA conformation, as is the case in the nucleus, would affect their calculation. (So that the reader gets an idea without having to get into the details described in the Supplement).

      Thank you. We have revised the sentence to: “We have verified that reducing the DNA persistence length, which promotes increased DNA coiling, results in only a modest increase in mean search time. Even under extreme coiling conditions, the increase remains below 30% of the baseline value, as detailed in Supplementary S9.”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this work, the authors apply TDCS to awake and anesthetized macaques to determine the effect of this modality on dynamic connectivity measured by fMRI. The question is to understand the extent to which TDCS can influence conscious or unconscious states. Their target was the PFC. During the conscious states, the animals were executing a fixation task. Unconsciousness was achieved by administering a constant infusion of propofol and a continuous infusion of the muscle relaxant cisatracurium. They observed the animals while awake receiving anodal or cathodal hd-TDCS applied to the PFC. During the cathodal stimulation, they found disruption of functional connectivity patterns, enhanced structure-function correlations, a decrease in Shannon entropy, and a transition towards patterns that were more commonly anatomically based. In contrast under propofol anesthesia anodal hd-TDCS stimulation appreciably altered the brain connectivity patterns and decreased the correlation between structure and function. The PFC stimulations altered patterns associated with consciousness as well as those associated with unconsciousness.

      Strengths: 

      The authors carefully executed a set of very challenging experiments that involved applying tDCS in awake and anesthetized non-human primates while conducting functional imaging.

      We thank the Reviewer for summarising our study and for his appreciation of the highly challenging experiments we performed.

      Weaknesses:

      The authors show that tDCS can alter functional connectivity measured by fMRI but they do not make clear what their studies teach the reader about the effects of tDCS on the brain during different states of consciousness. No important finding is stated contrary to what is stated in the abstract. It is also not clear what the work teaches us about how tDCS works nor is it clear what are the "clinical implications for disorders of consciousness." The deep anesthesia is akin to being in a state of coma. This was not discussed.  

      While the authors have executed a set of technically challenging experiments, it is not clear what they teach us about how tDCS works, normal brain neurophysiology, or brain pathological states such as disorders of consciousness.

      We thank the reviewer for his comments. We agree that we could better highlight the value and implications of our work, and we take this opportunity to improve our manuscript according to the suggestions.

      Actions in the text: We have added several new paragraphs in the Discussion section, considering these comments and other related remarks from the Reviewing Editor (see below our answer to the first comment of the Reviewing Editor: REC#1).

      Reviewer #2 (Public review): 

      General comments: 

      The authors investigated the effects of tDCS on brain dynamics in awake and anesthetized monkeys using functional MRI. They claim that cathodal tDCS disrupts the functional connectivity pattern in awake monkeys while anodal tDCS alters brain patterns in anesthetized monkeys. This study offers valuable insight into how brain states can influence the outcomes of noninvasive brain stimulation. However, there are several aspects of the methods and results sections that should be improved to clarify the findings.

      We thank the Reviewer for the summary and appreciation of our study.  

      Major comments 

      For the anesthetized monkeys, the anode location differs between subjects, with the electrode positioned to stimulate the left DLFPC in monkey R and the right DLPFC in monkey N. The authors mention that this discrepancy does not result in significant differences in the electric field due to the monkeys' small head size. However, this is incorrect, as placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side. Running an electric field simulation would confirm this. Additionally, the small electrode size suggested by the Easy cap configuration for NHP appears sufficient to stimulate the targeted regions focally. If this interpretation is correct, the authors should provide additional evidence to support their claim, such as a computational simulation of the EF distribution.

      We thank the Reviewer for the comments. First, regarding the reviewer’s statement that placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side, we would like to clarify that we did not use a typical 4 x 1 concentric ring high-definition setup (which consists of a small centre electrode surrounded by four return electrodes), but a two-electrode montage, with one electrode over the left or right PFC and the other one over the contralateral occipital cortex. According to EF modelling papers, a 4 x 1 high-definition setup would produce an EF that is focused and limited to the cortical area circumscribed by the ring of the return electrodes (Datta et al. 2009; Alam et al. 2016). Therefore, targeting the left or right DLPFC with a 4 x 1 setup would produce an EF confined to the targeted hemisphere of the PFC. In contrast, we expect the brain current flow generated with our 2-electrode setup to be broader, despite the small size of the electrodes,  because there is no constraint from return electrodes. Thus, with our setup, the current is expected to flow between the PFC and the occipital cortex (see also our responses to comments R3.3., R.E.C.#2.1. and R.E.C.#2.2.). 

      Second, we would like to point out that in awake experiments, in which we stimulated the right PFC of both monkeys, there was no gross evidence of left or right asymmetry in the computed functional connectivity patterns (Figure 3A, Figure 3 - figure supplement 2A; Figure 5A). These results, showing that our stimulation montages did not induce asymmetric dynamic FC changes in NHPs, support the idea that our setups did not generate EFs that were spatially focused enough to alter brain activity in one hemisphere substantially more than the other.

      Third, it is also worth noting that current evidence suggests that human brains are significantly more lateralized than those of macaques. Macaque monkeys have been found to have some degree of lateralized networks, but these are of lower complexity, and the lateralization is less pronounced and functionally organized than in humans. (Whey et al., 2014; Mantini et al., 2013). This suggests that, even if the stimulation were focal enough to stimulate the left or the right part of the PFC only, the behavioural effects would likely be similar.

      We strongly agree with the reviewer that conducting an EF simulation would be valuable to confirm our expectations and to gain a comprehensive view of the characteristics of the EFs generated with our different setups in NHPs. However, the challenge is in the fact that EF computational models have been developed for humans, and their use in NHPs is not straightforward due to significant anatomical differences. For example, macaque monkeys are distinct from humans in terms of brain size, shape and cortical organisation, skull thickness, and the presence of muscles, as well as different tissue conductivities (Lee et al. 2015; Datta et al.2016; Mantell et al. 2023). We plan to address this in future work.

      Actions in the text: In the Materials and Methods section, we have modified the sentence: “Because of the small size of the monkey's head and because we did not use return electrodes to restrict the current flow (as is achieved with typical high-definition montages (Datta et al. 2009; Alam et al. 2016)), we expected that tDCS stimulation with the two symmetrical montages would result in nearly equivalent electric fields across the monkey’s head and produce roughly similar effects on brain activity.” 

      We also added a new sentence about EF simulation: 

      “This would need to be confirmed by running an electric field simulation. However, computational electric field models have been developed for humans, and their use in NHPs is not straightforward due to anatomical specificities. Indeed, monkeys differ from humans in terms of brain size, shape and cortical organization, skull thickness, tissue conductivities and the presence of muscles (Lee et al. 2015; Datta et al. 2016; Mantell et al. 2023). Modelling of EFs generated with the specific tDCS montages employed in this study will be performed in future work.”

      For the anesthetized monkeys, the authors applied 1 mA tDCS first, followed by 2 mA tDCS. A 20-minute stimulation duration of 1 mA tDCS is strong enough to produce after-effects that could influence the brain state during the 2 mA tDCS. This raises some concerns. Previous studies have shown that 1 mA tDCS can generate EF of over 1 V/m in the brain, and the effects of stimulation are sensitive to brain state (e.g., eye closed vs. eye open). How do the authors ensure that there are no after-effects from the 1 mA tDCS? This issue makes it challenging to directly compare the effects of 1 mA and 2 mA stimulation.

      We agree with the reviewer's comment that 1 mA tDCS may induce aftereffects, as has been observed in several human studies (e.g., (Jamil et al. 2017, 2020). Although the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses, it's still possible that the stimulation produced some effects below the threshold of significance that may contribute, albeit weakly, to the changes observed during and after 2 mA stimulation. We have, therefore, amended the paper in line with the reviewer's comments.

      Actions in the text: We have added the following text in the Result section: 

      “While several human studies have reported that 1 mA transcranial stimulation induces aftereffects (e.g., (Jamil et al. 2017, 2020; Monte-Silva et al. 2010), the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses. However, it is still possible that the 1 mA stimulation produced some effects below the threshold of significance that may contribute to the changes observed during and after the 2 mA stimulation.”

      The occurrence rate of a specific structural-functional coupling pattern among random brain regions shows significant effects of tDCS. However, these results seem counterintuitive. It is generally understood that noninvasive brain stimulation tends to modulate functional connectivity rather than structural or structural-functional connectivity. How does the occurrence rate of structural-functional coupling patterns provide a more suitable measure of the effectiveness of tDCS than functional connectivity alone? I would recommend that the authors present the results based on functional connectivity itself. If there is no change in functional connectivity, the relevance of changes in structural-functional coupling might not translate into a meaningful alteration in brain function, making it unclear how significant this finding is without corresponding functional evidence.

      First, of all, we would like to make it clear that the occurrence rate of patterns as a function of their SFC is not intended to be used or seen as a ‘better’ measure of the efficacy of tDCS. Instead, it is one aspect of the effects of tDCS on whole-brain functional cortical dynamics, obtained from refined measures (phase-coherences), that specifically addresses the coupling between structure and function. This type of analysis is further motivated by its increasing use in the literature due to its suspected relationship to wakefulness (e.g., (Barttfeld et al. 2015, Demertzi et al. 2019; Castro et al. 2023)). Also, in our analysis, the structure is kept constant: the connectivity matrix used to correlate the functional brain states is always the same (CoCoMac82). Thus, the influence of tDCS on the structure-function side can only be explained by modulating the functional aspects, as suggested by intuition and previous results.

      Then, we agree with the reviewer that studying the functional changes induced by tDCS alone could be valuable. However, usual metrics used in FC analysis are usually done statistically: FC-states are either computed through averaging spatial correlations over time, then analyzed through graph-theoretical properties for instance (or by just directly computing the element-wise differences), or either by considering the properties of the different visited FC-states by computing spatial correlations over a sliding time-window, and then similar analysis can be done as previously explained. But these are static metrics, if the states visited are essentially the same (which is expected from non-invasive neuromodulations that haven’t already demonstrated strong and/or characteristic impact), but the dynamical process of visiting said states changes, one would see no difference in that regard. As such, in the case of resting-state fMRI, differences in FCs are hard to interpret given that between-sessions within-condition differences are usually found with some degree of variance for the respective conditions. Trying then to interpret between-condition differences is quite tricky in the case of subtle modulations of the system’s activity. On the other hand, more subtle differences can be captured by considering more detailed analysis, such as using phase-based methods like we did,  by incorporating some statistical learning component with regard to the dynamicity of the system (supervised learning for instance like we did followed by temporal & transition-based methodology), and by adding some dimensions along which one will be able to give some interpretation to the analysis.  In our case we were interested in characterizing resting-state differences between stimulation conditions, which have nuanced and subtle interactions with the biological system. 

      As such, classical measures of differences between FC states are likely to not be refined and precise enough. In fact, we propose additional files investigating those classically used measures such as differences in average FC matrices, or changes in functional graph properties (like modularity, efficiency and density) of the visited FC states. These figures show that, for the first case, comparing region-to-region specific FCs provides very few statistically significant results. With respect to the second part, we show that virtually no differences are observed in the properties of the functional states visited. 

      These results suggest, as expected, that the actual brain states visited across the different stimulation conditions are topologically quite similar, and that only very few region-specific pairwise functional connectivities are particularly modulated by specific tDCS montages while, on the other hand, the actual dynamical process dictating how the brain activity passes from one state to another is in fact being influenced as shown by the dynamical analysis presented in the main figures in a more apparent and meaningful way (in that it is dependent on the montage, somewhat consistent with regard to the post-stimulations conditions, and can be made sense of by considering the theoretical effect of near-anodal versus near-cathodal neuromodulatory effects).

      Actions in the text: We have added new supplementary files showing the effects of the stimulations on FC matrices and on classical functional graph properties in awake and anesthesia datasets (Supplementary Files 3 & 4).

      We have added new sentences about these new analyses on the effects of the stimulations on FC matrices and on classical functional graph properties in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).... In contrast, classical FC metrics did not show significant differences across stimulation conditions, highlighting the value of dynamic FC metrics to capture the neuromodulatory effects of tDCS.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary File 2), while classical FC metrics did not capture any statistical differences between the different stimulation conditions (Supplementary File 3).”

      The authors recorded data from only two monkeys, which may limit the investigation of the group effects of tDCS. As the number of scans for the second monkey in each consciousness condition is lower than that in the first monkey, there is a concern that the main effects might primarily reflect the data from a single monkey. I suggest that the authors should analyze the data for each monkey individually to determine if similar trends are observed in both subjects.

      We agree that the small number of subjects is a limitation of our study. However, we have already addressed these aspects by reporting statistical analyses that consider them, using linear models of such variables, and running them through ANOVA tests. In addition, we experimentally ensured that we recorded a relatively high number of sessions over a period of several years. Regardless, we agree that our study would benefit from further investigation into this matter. We have therefore prepared complementary figures showing the main analysis performed separately for the two monkeys as proposed, as well as further investigations into the inter-condition variability outmatching the inter-individual variability, itself being also outmatched by intra-individual changes. 

      Actions in the text: We have added a supplementary file showing the main analyses performed separately for the two monkeys (Supplementary File 2) and further investigations into the inter-condition variability (Supplementary Files 3 & 4).

      We have added new sentences about these analyses performed separately for the two monkeys in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3). The separate analyses showed that the changes in slope and Shannon entropy were substantially more pronounced in one of the two monkeys, corroborating some of the effects captured in the ANOVA tests.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary

      File 2)”.

      Anodal tDCS was only applied to anesthetized monkeys, which limits the conclusion that the authors are aiming for. It raises questions about the conclusion regarding brain state dependency. To address this, it would be better to include the cathodal tDCS session for anesthetized monkeys. If cathodal tDCS changes the connectivity during anesthesia, it becomes difficult to argue that the effects of cathodal tDCS vary depending on the state of consciousness as discussed in this paper. On the other hand, if cathodal tDCS would not produce any changes, the conclusion would then focus on the relationship between the polarity of tDCS and consciousness. In that case, the authors could maintain their conclusion but might need to refine it to reflect this specific relationship more accurately. 

      We agree with the reviewer that it would have been interesting to investigate the effects of cathodal tDCS in anesthetized monkeys. However, due to the challenging nature of the experimental procedures under anesthesia, we had to limit the investigations to only one stimulation modality. We chose to deliver anodal stimulation because, from a translational point of view, we aimed to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness. It also made much more sense to increase the cortical excitability of the prefrontal cortex in an attempt to wake up the sedated monkeys rather than doing the opposite.

      Actions in the text: We have added a new sentence in the Results section:

      “Due to the challenging nature of the experimental procedures under anesthesia, we limited the investigations to only one stimulation modality. We chose to deliver anodal stimulation to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness and to increase the cortical excitability of the PFC in an attempt to wake up the sedated monkeys.”

      Reviewer #3 (Public review): 

      Summary: 

      This study used transcranial direct current stimulation administered using small 'high-definition' electrodes to modulate neural activity within the non-human primate prefrontal cortex during both wakefulness and anaesthesia. Functional magnetic resonance imaging (fMRI) was used to assess the neuromodulatory effects of stimulation. The authors report on the modification of brain dynamics during and following anodal and cathodal stimulation during wakefulness and following anodal stimulation at two intensities (1 mA, 2 mA) during anaesthesia. This study provides some possible support that prefrontal direct current stimulation can alter neural activity patterns across wakefulness and sedation in monkeys. However, the reported findings need to be considered carefully against several important methodological limitations. 

      Strengths: 

      A key strength of this work is the use of fMRI-based methods to track changes in brain activity with good spatial precision. Another strength is the exploration of stimulation effects across wakefulness and sedation, which has the potential to provide novel information on the impact of electrical stimulation across states of consciousness.

      We thank the Reviewer for the summary and for highlighting the strengths of our study. 

      Weaknesses: 

      The lack of a sham stimulation condition is a significant limitation, for instance, how can the authors be sure that results were not affected by drowsiness or fatigue as a result of the experimental procedure?

      We agree with the reviewer that adding control conditions could have strengthened our study. Control conditions usually consist of a sham condition or active control conditions. However, as mentioned in response to one of Reviewer 2 comments (R.2.5), we had to make choices as we could not perform as many experiments due to their demanding nature, especially under anesthesia. 

      In the awake state, we acquired data with two experimental conditions; the monkeys were exposed to either anodal (F4/O1) or cathodal (O1/F4) PFC tDCS. As anodal tDCS of the PFC induced only minor changes in brain dynamics, it could be considered as an active control condition for the cathodal condition, which had striking effects on the cortical dynamics. It is also worth noting that doubts have been raised about the neurobiological inertia of certain sham protocols. Indeed, different sham protocols have been employed in the literature, some of which may produce unintended effects (Fonteneau et al. 2019). Therefore, active control conditions, such as reversing the polarity of the stimulation or targeting a different brain region, have been proposed to provide better control (Fonteneau et al. 2019). Furthermore, in the context of experiments performed under anesthesia, the relevance of a sham control condition typically used to achieve adequate blinding is questionable. 

      With regard to drowsiness and fatigue as a result of the experimental procedure, we agree with the reviewer that this is a common problem in functional imaging due to the length of the recording sessions. We assumed, as was done in previous work (Uhrig, Dehaene, and Jarraya 2014; Wang et al. 2015), that the monkeys' performance on the fixation task during acquisition would capture these periods of fatigue. Therefore, only sessions with fixation rates above 85% were included in our analysis. 

      Actions in the text: We have now specified, in the Materials and Methods section, the fact that only runs with a high fixation rate (> 85%) were included in the study: 

      “To ensure that the results were not biased by fatigue or drowsiness due to the lengthy

      In the anaesthesia condition, the authors investigated the effects of two intensities of stimulation (1 mA and 2 mA). However, a potential confound here relates to the possibility that the initial 1 mA stimulation block might have caused plasticity-related changes in neural activity that could have interfered with the following 2 mA block due to the lack of a sufficient wash-out period. Hence, I am not sure any findings from the 2 mA block can really be interpreted as completely separate from the initial 1 mA stimulation period, given that they were administered consecutively. Several previous studies have shown that same-day repeated tDCS stimulation blocks can influence the effects of neuromodulation (e.g., Bastani and Jaberzadeh, 2014, Clin Neurophysiol; Monte-Silva et al., J. Neurophysiology). 

      We agree with the reviewer’s comment that the initial 1 mA stimulation block might have induced changes in neural activity and that the 20-minute post 1 mA block would not be long enough to wash out these changes. This comment is very similar to the second comment made by Reviewer 2 (R.2.2). Although our experimental data do not support this possibility (as the differences between the 1 mA post-stimulation and baseline conditions were not significant), it is still conceivable that the stimulation produced some effects below the threshold of significance and that these might weakly contribute to the changes observed during and after the 2 mA stimulation. 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see our answer and actions in the text to R.2.2.).

      The different electrode placement for the two anaesthetised monkeys (i.e., Monkey R: F3/O2 montage, Monkey N: F4/O1 montage) is problematic, as it is likely to have resulted in stimulation over different brain regions. The authors state that "Because of the small size of the monkey's head, we expected that tDCS stimulation with these two symmetrical montages would result in nearly equivalent electric fields across the monkey's head and produce roughly similar effects on brain activity"; however, I am not totally convinced of this, and it really would need E-field models to confirm. It is also more likely that there would in fact be notable differences in the brain regions stimulated as the authors used HD-tDCS electrodes, which are generally more focal.

      We thank the Reviewer for the remark, which is very similar to the second comment from Reviewer 2. Please see our answer to the first comment of Reviewer 2 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see the actions taken in response to R.2.1.).

      Given the very small sample size, I think it is also important to consider the possibility that some results might also be impacted by individual differences in response to stimulation. For instance, in the discussion (page 9, paragraph 2) the authors contrast findings observed in awake animals versus anaesthetised animals. However, different monkeys were examined for these two conditions, and there were only two monkeys in each group (monkeys J and Y for awake experiments [both male], and monkeys R and N [male and female] for the anaesthesia condition). From the human literature, it is well known that there is a considerable amount of inter-individual variability in response to stimulation (e.g., Lopez-Alonso et al., 2014, Brain Stimulation; Chew et al., 2015, Brain Stimulation), therefore I wonder if some of these differences could also possibly result from differences in responsiveness to stimulation between the different monkeys? At the end of the paragraph, the authors also state "Our findings also support the use of tDCS to promote rapid recovery from general anesthesia in humans...and suggest that a single anodal prefrontal stimulation at the end of the anesthesia protocol may be effective." However, I'm not sure if this statement is really backed-up by the results, which failed to report "any behavioural signs of awakening in the animals" (page 7)?

      We thank the Reviewer for this comment. Because working with non-human primates is expensive and labor intensive, the sample sizes in classical macaque experiments are generally small (typically 2-4 subjects per experiment). Our sample size (i.e. 2 rhesus macaques in awake experiments and 2 macaques under sedation, 11 +/- 9 scan sessions per animal, 288 and 136 runs in the awake and anesthesia state, respectively) is comparable to other previous work in non-human primates using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024). In addition, we would like to point out that the baseline cortical dynamics we found before stimulation, whether in the awake or sedated state, are comparable to previous studies (Barttfeld et al. 2015; Uhrig et al. 2018; Tasserie et al. 2022). This suggests our results are reproducible across datasets, despite the small sample size.

      That being said, we agree with the reviewer that inter-individual variability in response to stimulation can be considerable, as shown by a large body of literature in the field. It seems possible that the two monkeys studied in each condition responded differently to the stimulation. But even if that’s the case, our results suggest that at least in one of the two monkeys, cathodal PFC stimulation in the awake state and anodal PFC stimulation under propofol anesthesia induced striking changes in brain dynamics, which we believe is a significant contribution to the field. 

      In fact, supplementary analysis, as proposed by Reviewer 2 (cf R2.4), investigating how the different measurables we’ve used were differently affected by tDCS show that indeed monkey Y’s case is more apparent and significant than monkey J’s. Still, the effects observed in monkey J’s case are still congruent with what is observed in monkey Y’s and at the population level (though less flagrant). We also show that these inter-individual variabilities are outmatched by the inter-condition variability, (as indicated by our initially strong statistical results at the population levels), thus showing that, even though we have different responses depending on the subject, the effects observed at the population level cannot be only accounted for by the differences in subjects’ specificities.

      Lastly, the Reviewer questioned whether our results support that a single anodal prefrontal stimulation at the end of the anesthesia protocol could effectively promote rapid recovery from general anesthesia, because the stimulation did not wake the animals in our experiments. It should be emphasized that in our case, the monkeys were stimulated while they were still receiving continuous propofol perfusion. In contrast, during the recovery process from anesthesia, the delivery of the anesthetic drug is stopped. It is therefore conceivable that anodal PFC tDCS, which successfully enriched brain dynamics in sedated monkeys in our experiments, may accelerate the recovery from anesthesia when the drug is no longer administered. 

      Actions in the text: We have added a line in the Materials and Methods to compare to other studies:

      “Our sample size is comparable to previous work in NHP using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024).”

      Reviewing Editor Comments: 

      In some cases, authors opt to submit a revised manuscript. Should you choose to do so, please be aware that the reviewers have indicated that their appraisal is unlikely to change unless some of the suggested field modelling is incorporated into the work. This may change the evaluation of the strength of evidence, but the final wording will be subject to reviewer discretion. Details for responding to the reviews are provided at the bottom of this email.

      Reviewer #1 (Recommendations for the authors): 

      The work should discuss the implications of their experiments for using tDCS to arouse a patient from a coma. The anesthetized animal is effectively in a drug-induced coma. While they observed connectivity changes, these changes did not map nicely onto behavioral changes. 

      I would suggest that the authors spell out more clearly what they view as the clinical implications of their work in terms of new insights into how tDCS may be used to either understand and or treat disorders of consciousness.

      We thank the Reviewer for his thoughtful comments. We appreciate the opportunity to clarify and expand on the key findings and implications of our work, particularly regarding the new insights into how tDCS can be used to understand and treat disorders of consciousness. We therefore provide a broader perspective on the clinical implications of our experiments regarding coma and disorders of consciousness. We also agree with the Reviewer that the absence of behavioral changes but the presence of functional differences should be more clearly addressed. 

      Actions in the text: We have added a few lines about the relevance of anesthesia as a model for disorders of consciousness in the Introduction part:

      “Anesthesia provides a unique model for studying consciousness, which, similarly to DOC, is characterized by the disruption or even  the loss of consciousness (Luppi 2024). Additionally, anesthesia mechanisms involve several subcortical nuclei that are key components of the brain's sleep and arousal circuits (Kelz and Mashour 2019).”

      In the Discussion section, we have modified and expanded a paragraph about the effects of tDCS in DOC patients and how this technique could be further used to study consciousness: From another clinical perspective, our results demonstrating that 2 mA anodal PFC tDCS decreased the structure-function correlation and modified the dynamic repertoire of brain patterns during anesthesia (Figures 6 and 7) are consistent with the beneficial effects of such stimulation in DOC patients (Thibaut et al., 2014; Angelakis et al., 2014; Thibaut et al., 2017; Zhang et al., 2017; Martens et al., 2018; Cavinato et al., 2019; Wu et al., 2019; Hermann et al., 2020; Peng et al., 2022; Thibaut et al., 2023). Although some clinical trials investigated the effects of stimulating other brain regions, such as the motor cortex (Martens et al., 2019; Straudi et al., 2019) or the parietal cortex (Huang et al., 2017; Guo et al., 2019; Zhang et al., 2022; Wan et al., 2023; Wang et al., 2020), the DLPFC appears to be the most effective target for patients with a minimally conscious state (Liu et al., 2023). In terms of neuromodulatory effects in DOC patients, DLPFC tDCS has been reported to increase global excitability (Bai et al., 2017), increase the P300 amplitude (Zhang et al., 2017; Hermann et al., 2020), improve the fronto-parietal coherence in the theta band (Bai et al., 2018), enhance the putative EEG markers of consciousness (Bai et al., 2018; Hermann et al., 2020) and reduce the incidence of slow-waves in the resting state (Mensen et al., 2020). Our findings further support the PFC as a relevant target for modulating consciousness level and align with growing evidence showing that the PFC plays a key role in conscious access networks (Mashour, Pal, and Brown 2022; Panagiotaropoulos 2024). Nevertheless, we hypothesize that other brain targets for tDCS may be of interest for consciousness restoration, potentially using multi-channel tDCS (Havlík et al., 2023). Among transcranial electrical stimulation techniques, tDCS has the great advantage of facilitating either excitation or inhibition of brain regions, depending on the polarity of the stimulation (Sdoia et al., 2019) exploited this advantage to investigate the causal involvement of the DLPFC in conscious access to a visual stimulus during an attentional blink paradigm. While conscious access was enhanced by anodal stimulation of the left DLPFC compared to sham stimulation, opposite effects were found with cathodal stimulation compared to sham over the same locus. Finally, this literature and our findings suggest that tDCS constitutes a non-invasive, reversible, and powerful tool for studying consciousness.”

      We have added a new paragraph about patients with cognitive-motor dissociation and dissociation between consciousness and behavioral responsiveness:

      “Changes in the state of consciousness are generally closely associated with changes in behavioural responsiveness, although some rare cases of dissociation have been described. Cognitive-motor dissociation (CMD) is a condition observed in patients with severe brain injury, characterized by behavior consistent with unresponsive wakefulness syndrome or a minimally conscious state minus (Thibaut et al., 2019). However, in these patients, specific cortical brain areas activate in response to mental imagery tasks (e.g., imagining playing tennis or returning home) in a manner indistinguishable from that of healthy controls, as shown through fMRI or EEG (Thibaut et al., 2019; Owen et al., 2006; Monti et al., 2010; Bodien et al., 2024). Thus, although CMD patients are behaviorally unresponsive, they demonstrate cognitive awareness that is not outwardly apparent. It is worth noting that both the structure-function correlation and the rate of the pattern closest to the anatomy were shown to be significantly reduced in unresponsive patients showing command following during mental imagery tasks compared to those who do not show command following (Demertzi et al., 2019). These observations would be compatible with our findings in anesthetized macaques exposed to 2 mA anodal PFC tDCS. The richness of the brain dynamics would be recovered (at least partially, in our experiments), but not the behaviour. This hypothesis also fits with a recent longitudinal fMRI study on patients recovering from coma (Crone et al., 2020). The researchers examined two groups of patients: one group consisted of individuals who were unconscious at the acute scanning session but regained consciousness and improved behavioral responsiveness a few months later, and the second group consisted of patients who were already conscious from the start and only improved behavioral responsiveness at follow-up. By comparing these two groups, the authors could distinguish between the recovery of consciousness and the recovery of behavioral responsiveness. They demonstrated that only initially conscious patients exhibited rich brain dynamics at baseline. In contrast, patients who were unconscious in the acute phase and later regained consciousness had poor baseline dynamics, which became more complex at follow-up. Complete recovery of both consciousness and responsiveness under general anesthesia is possible through electrical stimulation of the central thalamus (Redinbaugh et al., 2020; Tasserie et al., 2022).”

      Reviewer #2 (Recommendations for the authors): 

      Method 

      (1) The authors mentioned that they used HD-tDCS in their experiments; however, they used 1 x 1 tDCS, which is not HD-tDCS but rather single-channel tDCS.

      We thank the Reviewing Editor for pointing out this ambiguous wording. We understand that "HD-tDCS", which we used in our paper to refer to high-density 1x1 tDCS (because we used small carbon electrodes instead of the large sponge electrodes employed in conventional tDCS), may cause some confusion with high-definition tDCS, which uses compact ring electrodes and most commonly refers to a 4x1 montage (1 active central electrode over the target area and 4 return electrodes placed around the central electrode).

      Therefore, to avoid any confusion, we will use the term "tDCS" rather than “HD-tDCS” to qualify the technique used in this paper and suppress mentions of high-density or high-definition tDCS.

      Actions in the text: We have replaced the abbreviation “HD-tDCS” with “tDCS” throughout the paper. We have also suppressed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

      (2) Please provide the characteristics of electrodes, including their size, shape, and thickness.

      We thank the Reviewing Editor for this recommendation. We now provide the complete characteristics of the tDCS electrodes used in the paper.

      Actions in the text: We have added a sentence describing the characteristics of the tDCS electrodes in the Materials and Methods section:

      “We used a 1x1 electrode montage with two carbon rubber electrodes (dimensions: 1.4 cm x 1.85 cm, 0.93  cm thick) inserted into Soterix HD-tES MRI electrode holders (base diameter: 25 mm; height: 10.5 mm), which are in contact with the scalp. These electrodes (2.59 cm2) are smaller than conventional tDCS sponge electrodes (typically 25 to 35 cm<sup>2</sup>).”

      (3) Could the authors clarify why they chose to stimulate the right DLPFC? Is there a specific rationale for this choice? Additionally, could the authors explain how they ensured that the stimulation targeted the DLPFC, given that the monkey cap might differ from human configurations? In many NHP studies, structural MRI is used to accurately determine electrode placement. Considering that a single channel F4 - O2 montage was used, even a small displacement of the frontal electrode laterally could result in the electric field not adequately covering the DLPFC. Could the authors provide structural MRI images and details of electrode positioning to help readers better understand targeting accuracy?

      We thank the Reviewing Editor for the thoughtful comments and recommendations. We appreciate the opportunity to further clarify our rationale for stimulating the right DLPFC and also the suggestion to provide structural MRI images and details of electrode positioning, which we think will improve the quality of the paper by showing targeting accuracy.

      First, we would like to clarify that our initial decision to stimulate the right PFC in most animals was driven by experimental constraints. Indeed, we had limited access to the left PFC in three of the four macaques, either due to the presence of cement (spreading asymmetrically from the centre of the head) used to fix the head post in awake animals or due to a scar in one of the two animals studied under anesthesia. 

      Second, we agree with the Reviewing Editor on the importance of showing details of electrode positioning and evidence of targeting accuracy across MRI sessions. Therefore, we now provide structural images showing the positions of anodal and cathodal electrodes in almost all acquired sessions: 10 sessions (out of 10) under anesthesia and 30 sessions in the awake state (out of 34 sessions, because we could not acquire structural images in four sessions). These images show that, in anesthesia experiments, the anodal electrode was positioned over the dorsal prefrontal cortex and the cathodal electrode was placed over the contralateral occipital cortex (at the level of the parieto–occipital junction) in both monkeys. In the awake state, the montage still targeted the prefrontal cortex and the occipital cortex, but with a slightly different placement. One of the electrodes was placed over the prefrontal cortex, closer to the premotor cortex than in anesthesia experiments, while the other one was placed over the occipital cortex (V1), slightly more posterior than in anesthesia experiments. These images therefore show that the placement was relatively accurate across sessions and reproducible between monkeys in each of the two arousal conditions.

      Actions in the text: We have added a supplementary file showing electrode positioning in 40 of the 44 acquired MRI sessions (Supplementary File 1). We have also added a new supplement figure (Figure 1 - figure supplement 1) showing electrode positioning in representative MRI sessions of the awake and anesthetized experiments in the main manuscript. 

      We added a few sentences referring to these figures in the Result section: 

      “Representative structural images showing electrode placements on the head of the two awake monkeys are shown in Figure 1 - figure supplement 1A). Supplementary File 1 displays the complete set of structural images, showing that the two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across awake sessions.”

      Figure 1 - figure supplement 1. Structural images displaying electrode placements on the head of monkeys. A) Awake experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and the occipital electrodes on the head of monkeys J. and Y. B) Anesthesia experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes over the occipital cortex on the head of monkeys R. and N.

      Supplementary File 1 (see attached file). Structural images showing the position of the tDCS electrodes on the monkey's head across sessions. Sagittal, coronal and transverse MRI sections, and corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes on the monkey's head for each MRI session (except for 4 sessions in which no anatomical scan was acquired). The two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across sessions and between the two monkeys studied in each arousal state. In anesthesia experiments, the anodal electrode was placed over the dorsal prefrontal cortex, while the cathodal electrode was positioned over the parieto-occipital junction. In awake experiments, the prefrontal electrode was positioned over the dorsal prefrontal cortex/pre-motor cortex, while the occipital electrode was placed over the visual area 1. The position of the two electrodes differed slightly between the anesthetized and awake experiments due to different body positions (the prone position of the sedated monkeys prevented a more posterior position of the occipital electrode) and also due to the presence of a headpost on the head of the two monkeys in awake experiments (the monkeys we worked with in anesthesia experiments did not have an headpost).

      (4) If the authors did not analyze the data for the passive event-related auditory response, it may be helpful to remove the related sentence to avoid potential confusion for readers.

      We thank the Reviewing Editor for the comment. Although we understand the reviewer’s point of view, we decide to keep this information in the paper to inform the reader that the macaques were passively engaged in an auditory task, as this could have some influence on the brain state. In the Materials and Methods section, we already mentioned that the analysis of the cerebral responses to the auditory paradigm is not part of the paper. We have modified the sentence to make it clearer and to avoid potential confusion for readers.

      Actions in the text: We have modified the sentence referring to the passive event-related auditory response in the Materials and Methods section:

      “All fMRI data were acquired while the monkeys were engaged in a passive event-related auditory task, the local-global paradigm, which is based on local and global deviations from temporal regularities (Bekinschtein et al. 2009; Uhrig, Dehaene, and Jarraya 2014). The present paper does not address how tDCS perturbs cerebral responses to local and global deviants, which will be the subject of future work.”

      (5) Could the authors clarify what x(t) represents in the equation? Additionally, it would be better to number the equations.

      We apologize for the confusion,  x(t) represents the evolution of the BOLD signals over time. We have numbered the equations as suggested. 

      Actions in the text: We have added explanations about the notation and numerotation of equations.

      (6) It would be much better to provide schematic illustrations to explain what the authors did for analyzing fMRI data.

      We thank the Reviewing Editor for the suggestion and now provide a new figure as suggested.  

      Actions in the text: We have added a new figure (Figure 2) graphically showing the overall analysis performed. We have added a sentence about the new Figure 2 in the Results section:  “A graphical overview of the overall analysis is shown in Figure 2.” We have renumbered Figure 2 - supplement figures accordingly.

      Figure 2. fMRI Phase Coherence analysis. A) Left) Animals were scanned before, during and after PFC tDCS stimulation in the awake state (two macaques) or under deep propofol anesthesia (two macaques). Right) Example of Z-scored filtered BOLD time series for one macaque, 111 time points with a TR of 2.4 s. B) Hilbert transform of the z-scored BOLD signal of one ROI into its time-varying amplitude A(t) (red) and the real part of the phase φ (green). In blue, we recover the original z-scored BOLD signal as A(t)cos(φ). C) Example of the phase of the Hilbert transform for each brain region at one TR. D) Symmetric matrix of cosines of the phase differences between all pairs of brain regions. E) We concatenated the vectorized form of the triangular superior of the phase difference matrices for all TRs for all participants, in all the conditions for both datasets separately obtaining using the K-means algorithm, the brain patterns whose statistics are then analyzed in the different conditions.

      Results 

      (1) In Figures 3A, 5A, and 6A showing brain connectivity, it is difficult to relate the connectivity variability among the brain regions. Instead of displaying connection lines for nodes, it would be more effective if the authors highlighted significant, strong connectivity within specific brain regions using additional methods, such as bootstrapping.

      We thank the Reviewing Editor for the comment and suggestion. The connection lines indeed represent all the synchronizations above 0.5 and all the anti-synchronization below -0.5 between all pairs of brain regions. As suggested, another element we haven’t addressed is the heterogeneity in coherences between individual brain regions. We hence propose additional supplementary figures showing, for all centroids mentioned in main figures, the variance in phase-based connectivity of the distributions of coherence of all brain regions to the rest of the brain. High value would then indicate a wide range of values of coherence, while low would indicate the different coherence a region has with the rest of the brain have similar values. Thus, a brain with uniform color would indicate high homogeneity in coherence among brain regions, while sharp changes in colors would reveal that certain regions are more subject to high variance in their coherence distributions. We expect this new figure to more clearly expose the connectivity variability among the brain regions.

      Actions in the text: We have added new figures showing, for all centroids mentioned in the main figures, the variances in phase-based connectivity of the distributions of coherence  (Figure 3 - figure supplement 3;  Figure 5 - figure supplement 2; Figure 6 - figure supplement 3; Figure 7 - figure supplement 2). One of them is shown below for the only awake analysis (Figure 3 - figure supplement 3).

      Figure 3 - figure supplement 3. Variance in inter-region phase coherences of brain patterns. Low values (red and light red) indicate that the distribution of synchronizations between a brain region and the rest of the brain has relatively low variance, while high values (blue and light blue) indicate relatively high variance. Are displayed both supra (top) and subdorsal (bottom) views for each brain pattern from the main figure, ordered similarly as previously: from left (1) to right (6) as their respective SFC increases. 

      We added a few sentences about variances in phase-based connectivity of the distributions of coherence in the Result section: 

      “Further investigation of the variances in inter-region phase coherences of brain patterns, presented in Figure 3 - figure supplement 3, revealed two main findings. First, all the patterns exhibited some degree of lateral symmetry. Second, except for the pattern with the highest SFC, most patterns displayed high heterogeneity in their coherence variances and striking inter-pattern differences. These observations reflect both the segmentation of distinct functional networks across patterns and a topological organization within the patterns themselves: some regions showed a broader spectrum of synchrony with the rest of the brain, while others exhibited narrower distributions of coherence variances. For instance, unlike other brain patterns, pattern 5 was characterized by a high coherence variance in the frontal premotor areas and low variance in the occipital cortex, whereas pattern 3 had a high variance in the frontal and orbitofrontal regions. In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).”

      “The variance in inter-regional phase coherence across brain patterns showed notably that pattern 4, in contrast to most other patterns, was characterized by a high variance in frontal premotor areas and a low variance in the occipital cortex (Figure 5 - figure supplement 2)." 

      “The variance in inter-region phase coherences of the brain patterns is displayed in Figure 6 - figure supplement 3 and showed a striking heterogeneity between the patterns. For example, pattern 5 had a low overall variance (except in the frontal cortex), while pattern 1 was the only pattern with a high variance in the occipital cortex.”

      “The variance in inter-region phase coherences of brain patterns is displayed in Figure 6 - figure supplement 2.”

      (2) For both conditions, only 2 to 3 out of 6 patterns showed significant effects of tDCS on the occurrence rate. Is it sufficient to claim the authors' conclusion?

      We thank the Reviewer Editor for the comment. We would like to point out that similar kinds of differences in the occurrence rates of specific brain patterns (particularly in patterns at the extremities of the SFC scale) have already been reported previously. Prior works in patients suffering from disorders of consciousness, in healthy humans or in non-human primates,  have shown, by using a similar method of analysis, that not all brain states are equally disturbed by loss of consciousness, even in different modalities of unconscious transitioning (Luppi et al. 2021; Z. Huang et al. 2020; Demertzi et al. 2019; Castro et al. 2023; Golkowski et al. 2019; Barttfeld et al. 2015). Therefore, yes we believe that our conclusions are still supported by the results.

      (3) If the authors want to assert that the brain state significantly influences the effects of tDCS as discussed in the manuscript, further analysis is necessary. First, it would be great to show the difference in connectivity between two consciousness conditions during the baseline (resting state) to see how resting state connectivity or structural connectivity varies. Second, demonstrating the difference in connectivity between the awake and anesthetized conditions (e.g., awake during cathodal vs. anesthetized cathodal) to show how the connectivity among the brain regions was changed by the brain state during tDCS. This would strengthen the authors' conclusion.

      We thank the reviewer for this comment. Firstly, we’d like to clarify that the structural connectivity doesn’t change from one session to another in the same animal and minimally between subjects. Secondly, we agree with the Reviewing Editor that it is informative to show the differences between the baselines and this is what we have done. The results are shown in Figures 5 and 7. Regarding the comparison of the stimulating conditions across arousal levels, the only contrast that we could make is to compare 2 mA anodal awake with 2 mA anodal anesthetized (during and post-stimulation). However, as 2 mA anodal stimulation in the awake state did not affect the connectivity much (compared to the awake baseline), the results would be almost similar to the comparison of the awake baseline with 2 mA anodal anesthetized, which is shown in Figure 7. Therefore, we believe that this would result in minimal informative gains and even more redundancy. 

      Reviewer #3 (Recommendations for the authors): 

      Introduction, par 2: HD-tDCS does not necessarily produce stronger electric fields (E-fields) in the brain. The E-field is largely montage-dependent, and some configurations such as the 4x1 configuration can actually have weaker E-fields compared to conventional tDCS designs (i.e., with two sponge electrodes) as electrodes are often closer together resulting in more current being shunted by skull, scalp, and CSF. I would consider re-phrasing this section.

      We agree with the Reviewer Editor that high-definition tDCS does not necessarily produce stronger electric fields in the brain and apologize for the confusion caused by our use of HD-tDCS to refer to high-density tDCS. To avoid any confusion, we have removed the sentence mentioning that HD-tDCS produces stronger electric fields. 

      Actions in the text: We have removed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

    1. Author response:

      General Statements:

      The formation of three-dimensional tubes is a fundamental process in the development of organs and aberrant tube size leads to common diseases and congenital disorders, such as polycystic kidney disease, asthma, and lung hypoplasia. The apical (luminal) extracellular matrix (ECM) plays a critical role in epithelial tube morphogenesis during organ formation, but its composition and organization remain poorly understood. Using the Drosophila embryonic salivary gland as a model, we reveal a critical role for the PAPS Synthetase (Papss), an enzyme that synthesizes the universal sulfate donor PAPS, as a critical regulator of tube lumen expansion. Additionally, we identify two zona pellucida (ZP) domain proteins, Piopio (Pio) and Dumpy (Dpy) as key apical ECM components that provide mechanical support to maintain a uniform tube diameter.

      The apical ECM has a distinct composition compared to the basal ECM, featuring a diverse array of components. Many studies of the apical ECM have focused on the role of chitin and its modification, but the composition of the non-chitinous apical ECM and its role, and how modification of the apical ECM affects organogenesis remain elusive. The main findings of this manuscript are listed below.

      (1) Through a deficiency screen targeting ECM-modifying enzymes, we identify Papss as a key enzyme regulating luminal expansion during salivary gland morphogenesis. 

      (2) Our confocal and transmission electron microscopy analyses reveal that Papss mutants exhibit a disorganized apical membrane and condensed aECM, which are at least partially linked to disruptions in Golgi structures and intracellular trafficking. Papss is also essential for cell survival and basal ECM integrity, highlighting the role of sulfation in regulating both apical and basal ECM.

      (3) Salivary gland-specific overexpression of wild-type Papss rescues all defects in Papss mutants, but the catalytically inactive mutant form does not, suggesting that defects in sulfation are the underlying cause of the phenotypes.

      (4) We identify two ZP domain proteins, Piopio (Pio) and Dumpy (Dpy), as key components of the salivary gland aECM. In the absence of Papss, Pio is progressively lost from the aECM, while the Dpy-positive aECM structure is condensed and detaches from the apical membrane, resulting in a narrowed lumen. 

      (5) Mutations in pio or dpy, or in Notopleural (Np), which encodes a matriptase that cleaves Pio, cause the salivary gland lumen to develop alternating bulges and constrictions. Additionally, loss of pio results in loss of Dpy in the salivary gland lumen, suggesting that the Dpycontaining filamentous structures of the aECM is critical for maintaining luminal diameter, with Pio playing an essential role in organizing this structure.

      (6) We further reveal that the cleavage of the ZP domain of Pio by Np is critical for the role of Pio in organizing the aECM structure.

      Overall, our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining tube diameter. Mammals have two isoforms of Papss, Papss1 and Papss2. Papss1 shows ubiquitous expression, with higher levels in glandular cells and salivary duct cells, suggesting a high requirement for sulfation in these cell types. Papss2 shows a more restricted expression, such as in cartilage, and mutations in Papss2 have been associated with skeletal dysplasia in humans. Our analysis of the Drosophila Papss gene, a single ortholog of human Papss1 and Papss2, reveals its multiple roles during salivary gland development. We expect that these findings will provide valuable insights into the function of these enzymes in normal development and disease in humans. Our findings on the key role of two ZP proteins, Pio and Dpy, as major components of the salivary gland aECM also provide valuable information on the organization of the non-chitinous aECM during organ formation.

      We believe that our results will be of broad interest to many cell and developmental biologists studying organogenesis and the ECM, as well as those investigating the mechanisms underlying human diseases associated with conserved mutations.

      Point-by-point description of the revisions:

      We are delighted that all three reviewers were enthusiastic about the work. Their comments and suggestions have improved the paper. The details of the changes we have made in response to each reviewer’s comments are included in italicized text below.

      Reviewer #1 (Evidence, reproducibility and clarity):

      PAPS is required for all sulfotransferase reactions in which a sulfate group is covalently attached to amino acid residues of proteins or to side chains of proteoglycans. This sulfation is crucial for properly organizing the apical extracellular matrix (aECM) and expanding the lumen in the Drosophila salivary gland. Loss of Papss potentially leads to decreased sulfation, disorganizing the aECM, and defects in lumen formation. In addition, Papss loss destabilizes the Golgi structures.

      In Papss mutants, several changes occur in the salivary gland lumen of Drosophila. The tube lumen is very thin and shows irregular apical protrusions. There is a disorganization of the apical membrane and a compaction of the apical extracellular matrix (aECM). The Golgi structures and intracellular transport are disturbed. In addition, the ZP domain proteins Piopio (Pio) and Dumpy (Dpy) lose their normal distribution in the lumen, which leads to condensation and dissociation of the Dpy-positive aECM structure from the apical membrane. This results in a thin and irregularly dilated lumen.

      (1) The authors describe various changes in the lumen in mutants, from thin lumen to irregular expansion. I would like to know the correct lumen diameter, and length, besides the total area, by which one can recognize thin and irregular.

      We have included quantification of the length and diameter of the salivary gland lumen in the stage 16 salivary glands of control, Papss mutant, and salivary gland-specific rescue embryos (Figure 1J, K). As described, Papss mutant embryos have two distinct phenotypes, one group with a thin lumen along the entire lumen and the other group with irregular lumen shapes. Therefore, we separated the two groups for quantification of lumen diameter. Additionally, we have analyzed the degree of variability for the lumen diameter to better capture the range of phenotypes observed (Figure 1K’). These quantifications enable a more precise assessment of lumen morphology, allowing readers to distinguish between thin and irregular lumen phenotypes.

      (2) The rescue is about 30%, which is not as good as expected. Maybe the wrong isoform was taken. Is it possible to find out which isoform is expressed in the salivary glands, e.g., by RNA in situ Hyb? This could then be used to analyze a more focused rescue beyond the paper.

      Thank you for this point, but we do not agree that the rescue is about 30%. In Papss mutants, about 50% of the embryos show the thin lumen phenotype whereas the other 50% show irregular lumen shapes. In the rescue embryos with a WT Papss, few embryos showed thin lumen phenotypes. About 40% of the rescue embryos showed “normal, fully expanded” lumen shapes, and the remaining 60% showed either irregular (thin+expanded) or slightly overexpanded lumen. It is not uncommon that rescue with the Gal4/UAS system results in a partial rescue because it is often not easy to achieve the balance of the proper amount of the protein with the overexpression system. 

      To address the possibility that the wrong isoform was used, we performed in situ hybridization to examine the expression of different Papss spice forms in the salivary gland. We used probes that detect subsets of splice forms: A/B/C/F/G, D/H, and E/F/H, and found that all probes showed expression in the salivary gland, with varying intensities. The original probe, which detects all splice forms, showed the strongest signals in the salivary gland compared to the new probes which detect only a subset. However, the difference in the signal intensity may be due to the longer length of the original probe (>800 bp) compared to other probes that were made with much smaller regions (~200 bp). Digoxigenin in the DIG labeling kit for mRNA detection labels the uridine nucleotide in the transcript, and the probes with weaker signals contain fewer uridines (all: 147; ABCFG, 29; D, 36; EFH, 66). We also used the Papss-PD isoform, for a salivary gland-specific rescue experiment and obtained similar results to those with Papss-PE (Figure 1I-L, Figure 4D and E). 

      Furthermore, we performed additional experiments to validate our findings. We performed a rescue experiment with a mutant form of Papss that has mutations in the critical rescues of the catalytic domains of the enzyme, which failed to rescue any phenotypes, including the thin lumen phenotype (Figure 1H, J-L), the number and intensity of WGA puncta (Figure 3I, I’), and cell death (Figure 4D, E). These results provide strong evidence that the defects observed in Papss mutants are due to the lack of sulfation.  

      (3) Crb is a transmembrane protein on the apicolateral side of the membrane. Accordingly, the apicolateral distribution can be seen in the control and the mutant. I believe there are no apparent differences here, not even in the amount of expression. However, the view of the cells (frame) shows possible differences. To be sure, a more in-depth analysis of the images is required. Confocal Z-stack images, with 3D visualization and orthogonal projections to analyze the membranes showing Crb staining together with a suitable membrane marker (e.g. SAS or Uif). This is the only way to show whether Crb is incorrectly distributed. Statistics of several papas mutants would also be desirable and not just a single representative image. When do the observed changes in Crb distribution occur in the development of the tubes, only during stage 16? Is papss only involved in the maintenance of the apical membrane? This is particularly important when considering the SJ and AJ, because the latter show no change in the mutants.

      We appreciate your suggestion more thoroughly analyze Crb distribution. We adapted a method from a previous study (Olivares-Castiñeira and Llimargas, 2017) to quantify Crb signals in the subapical region and apical free region of salivary gland cells. Using E-Cad signals as a reference, we marked the apical cell boundaries of individual cells and calculated the intensity of Crb signals in the subapical region (along the cell membrane) and in the apical free region. We focused on the expanded region of the SG lumen in Papss mutants for quantification, as the thin lumen region was challenging to analyze. This quantification is included in Figure 2D. Statistical analysis shows that Crb signals were more dispersed in SG cells in Papss mutants compared to WT.

      (4) A change in the ECM is only inferred based on the WGA localization. This is too few to make a clear statement. WGA is only an indirect marker of the cell surface and glycosylated proteins, but it does not indicate whether the ECM is altered in its composition and expression. Other important factors are missing here. In addition, only a single observation is shown, and statistics are missing.

      We understand your concern that WGA localization alone may not be sufficient to conclude changes in the ECM. However, we observed that luminal WGA signals colocalize with Dpy-YFP in the WT SG (Figure 5-figure supplement 2C), suggesting that WGA detects the aECM structure containing Dpy. The similar behavior of WGA and Dpy-YFP signals in multiple genotypes further supports this idea. In Papss mutants with a thin lumen phenotype, both WGA and Dpy-YFP signals are condensed (Figure 5E-H), and in pio mutants, both are absent from the lumen (Figure 6B, D). We analyzed WGA signals in over 25 samples of WT and Papss mutants, observing consistent phenotypes. We have included the number of samples in the text. While we acknowledge that WGA is an indirect marker, our data suggest that it is a reliable indicator of the aECM structure containing Dpy. 

      (5) Reduced WGA staining is seen in papss mutants, but this could be due to other circumstances. To be sure, a statistic with the number of dots must be shown, as well as an intensity blot on several independent samples. The images are from single confocal sections. It could be that the dots appear in a different Z-plane. Therefore, a 3D visualization of the voxels must be shown to identify and, at best, quantify the dots in the organ.

      We have quantified cytoplasmic punctate WGA signals. Using spinning disk microscopy with super-resolution technology (Olympus SpinSR10 Sora), we obtained high-resolution images of cytoplasmic punctate signals of WGA in WT, Papss mutant, and rescue SGs with the WT and mutant forms of Papss-PD. We then generated 3D reconstructed images of these signals using Imaris software (Figure 3E-H) and quantified the number and intensity of puncta. Statistical analysis of these data confirms the reduction of the number and intensity of WGA puncta in Papss mutants (Figure 3I, I’). The number of WGA puncta was restored by expressing WT Papss but not the mutant form. By using 3D visualization and quantification, we have ensured that our results are not limited to a single confocal section and account for potential variations in Z-plane localization of the dots.

      (6) A colocalization analysis (statistics) should be shown for the overlap of WGA with ManII-GFP.

      Since WGA labels multiple structures, including the nuclear envelope and ECM structures, we focused on assessing the colocalization of the cytoplasmic WGA punctate signals and ManIIGFP signals. Standard colocalization analysis methods, such as Pearson’s correlation coefficient or Mander’s overlap coefficient, would be confounded by WGA signals in other tissues. Therefore, we used a fluorescent intensity line profile to examine the spatial relationship between WGA and ManII-GFP signals in WT and Papss mutants (Figure 3L, L’). 

      (7) I do not understand how the authors describe "statistics of secretory vesicles" as an axis in Figure 3p. The TEM images do not show labeled secretory vesicles but empty structures that could be vesicles.

      Previous studies have analyzed “filled” electron-dense secretory vesicles in TEM images of SG cells (Myat and Andrew, 2002, Cell; Fox et al., 2010, J Cell Biol; Chung and Andrew, 2014, Development). Consistent with these studies, our WT TEM images show these vesicles. In contrast, Papss mutants show a mix of filled and empty structures. For quantification, we specifically counted the filled electron-dense vesicles (now Figure 3W). A clear description of our analysis is provided in the figure legend.

      (8) The quality of the presented TEM images is too low to judge any difference between control and mutants. Therefore, the supplement must present them in better detail (higher pixel number?).

      We disagree that the quality of the presented TEM images is too low. Our TEM images have sufficient resolution to reveal details of many subcellular structures, such as mitochondrial cisternae. The pdf file of the original submission may not have been high resolution. To address this concern, we have provided several original high-quality TEM images of both WT and Papss mutants at various magnifications in Figure 2-figure supplement 2. Additionally, we have included low-magnification TEM images of WT and Papss mutants in Figure 2H and I to provide a clearer view of the overall SG lumen morphology. 

      (9) Line 266: the conclusion that apical trafficking is "significantly impaired" does not hold. This implies that Papss is essential for apical trafficking, but the analyzed ECM proteins (Pio, Dumpy) are found apically enriched in the mutants, and Dumpy is even secreted. Moreover, they analyze only one marker, Sec15, and don't provide data about the quantification of the secretion of proteins.

      We agree and have revised our statement to “defective sulfation affects Golgi structures and multiple routes of intracellular trafficking”. 

      (10) DCP-1 was used to detect apoptosis in the glands to analyze acellular regions. However, the authors compare ST16 control with ST15 mutant salivary glands, which is problematic. Further, it is not commented on how many embryos were analyzed and how often they detect the dying cells in control and mutant embryos. This part must be improved.

      Thank you for the comment. We agree and have included quantification. We used stage 16 samples from WT and Papss mutants to quantify acellular regions. Since DCP-1 signals are only present at a specific stage of apoptosis, some acellular regions do not show DCP-1 signals. Therefore, we counted acellular regions regardless of DCP-1 signals. We also quantified this in rescue embryos with WT and mutant forms of Papss, which show complete rescue with WT and no rescue with the mutant form, respectively. The graph with a statistical analysis is included (Figure 4D, E).

      (11) WGA and Dumpy show similar condensed patterns within the tube lumen. The authors show that dumpy is enriched from stage 14 onwards. How is it with WGA? Does it show the same pattern from stage 14 to 16? Papss mutants can suffer from a developmental delay in organizing the ECM or lack of internalization of luminal proteins during/after tube expansion, which is the case in the trachea.

      Dpy-YFP and WGA show overlapping signals in the SG lumen throughout morphogenesis. DpyYFP is SG enriched in the lumen from stage 11, not stage 14 (Figure 5-figure supplement 2). WGA is also detected in the lumen throughout SG morphogenesis, similar to Dpy. In the original supplemental figure, only a stage 16 SG image was shown for co-localization of Dpy-YFP and WGA signals in the SG lumen. We have now included images from stage 14 and 15 in Figure 5figure supplement 2C. 

      Given that luminal Pio signals are lost at stage 16 only and that Dpy signals appear as condensed structures in the lumen of Papss mutants, it suggests that the internalization of luminal proteins is not impaired in Papss mutants. Rather, these proteins are secreted but fail to organize properly. 

      (12) Line 366. Luminal morphology is characterized by bulging and constrictions. In the trachea, bulges indicate the deformation of the apical membrane and the detachment from the aECM. I can see constrictions and the collapsed tube lumen in Fig. 6C, but I don't find the bulges of the apical membrane in pio and Np mutants. Maybe showing it more clearly and with better quality will be helpful.

      Since the bulging phenotype appears to vary from sample to sample, we have revised the description of the phenotype to “constrictions” to more accurately reflect the consistent observations. We quantified the number of constrictions along the entire lumen in pio and Np mutants and included the graph in Figure 6F.

      (13) The authors state that Papss controls luminal secretion of Pio and Dumpy, as they observe reduced luminal staining of both in papss mutants. However, the mCh-Pio and Dumpy-YFP are secreted towards the lumen. Does papss overexpression change Pio and Dumpy secretion towards the lumen, and could this be another explanation for the multiple phenotypes? 

      Thank you for the comment. To clarify, we did not observe reduced luminal staining of Pio and Dpy in Papss mutants, nor did we state that Papss controls luminal secretion of Pio and Dpy. In Papss mutants, Pio luminal signals are absent specifically at stage 16 (Figure 5H), whereas strong luminal Pio signals are present until stage 15 (Figure 5G). For Dpy-YFP, the signals are not reduced but condensed in Papss mutants from stages 14-16 (Figure 5D, H). 

      It remains unclear whether the apparent loss of Pio signals is due to a loss of Pio protein in the lumen or due to epitope masking resulting from protein aggregation or condensation. As noted in our response to Comment 11 internalization of luminal proteins seems unaffected in Papss mutants; proteins like Pio and Dpy are secreted into the lumen but fail to properly organize. Therefore, we have not tested whether Papss overexpression alters the secretion of Pio or Dpy.

      In our original submission, we incorrectly stated that uniform luminal mCh-Pio signals were unchanged in Papss mutants. Upon closer examination, we found these signals are absent in the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      Regulation of luminal ZP protein level is essential to modulate the tube expansion; therefore, Np releases Pio and Dumpy in a controlled manner during st15/16. Thus, the analysis of Pio and Dumpy in NP overexpression embryos will be critical to this manuscript to understand more about the control of luminal ZP matrix proteins.

      Thanks for the insightful suggestion. We overexpressed both the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. It is important to note that these overexpression experiments were done in the presence of the endogenous WT Np. 

      Overexpression of Np.WT led to increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. In contrast, overexpression of Np.S990A resulted in a near complete loss of luminal mCh-Pio signals. Pio antibody signals remained strong at the apical membrane but was weaker in the luminal filamentous structures compared to WT. 

      Due to the GFP tag present in the UAS-Np.S990A line, we could not reliably analyze Dpy-YFP signals because of overlapping fluorescent signals in the same channel. However, the filamentous Pio signals in the lumen co-localized with GFP signals, suggesting that these structures might also include Dpy-YFP, although this cannot be confirmed definitively. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I.

      (14) Minor:

      Fig. 5 C': mChe-Pio and Dumpy-YFP are mixed up at the top of the images.

      Thanks for catching this error.  It has been corrected.

      Sup. Fig7. A shows Pio in purple but B in green. Please indicate it correctly.

      It has been corrected.

      Reviewer #1 (Significance):

      In 2023, the functions of Pio, Dumpy, and Np in the tracheal tubes of Drosophila were published. The study here shows similar results, with the difference that the salivary glands do not possess chitin, but the two ZP proteins Pio and Dumpy take over its function. It is, therefore, a significant and exciting extension of the known function of the three proteins to another tube system. In addition, the authors identify papss as a new protein and show its essential function in forming the luminal matrix in the salivary glands. Considering the high degree of conservation of these proteins in other species, the results presented are crucial for future analyses and will have further implications for tubular development, including humans.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation (Alcian Blue staining) and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing, with just a few things about the fusions needing clarification.

      Minor comments

      (1) Although the Dpy and Qsm fusions are published reagents, it would still be helpful to mention whether the tags are C-terminal as suggested by the nomenclature, and whether Westerns have been performed, since (as discussed for Pio) cleavage could also affect the appearance of these fusions.

      Thanks for the comment. Dpy-YFP is a knock-in line in which YFP is inserted into the middle of the dpy locus (Lye et al., 2014; the insertion site is available on Flybase). mCh-Qsm is also a knock-in line, with mCh inserted near the N-terminus of the qsm gene using phi-mediated recombination using the qsm<sup>MI07716</sup> line (Chu and Hayashi, 2021; insertion site available on Flybase). Based on this, we have updated the nomenclature from Qsm-mCh to mCh-Qsm throughout the manuscript to accurately reflect the tag position. To our knowledge, no western blot has been performed on Dpy-YFP or mCh-Qsm lines. We have mentioned this explicitly in the Discussion.  

      (2) The Dpy-YFP reagent is a non-functional fusion and therefore may not be a wholly reliable reporter of Dpy localization. There is no antibody confirmation. As other reagents are not available to my knowledge, this issue can be addressed with text acknowledgement of possible caveats.

      Thanks for raising this important point. We have added a caveat in the Discussion noting this limitation and the need for additional tools, such as an antibody or a functional fusion protein, to confirm the localization of Dpy.

      (3) TEM was done by standard chemical fixation, which is fine for viewing intracellular organelles, but high pressure freezing probably would do a better job of preserving aECM structure, which looks fairly bad in Fig. 2G WT, without evidence of the filamentous structures seen by light microscopy. Nevertheless, the images are sufficient for showing the extreme disorganization of aECM in papss mutants.

      We agree that HPF is a better method and intent to use the HPF system in future studies. We acknowledge that chemical fixation contributes to the appearance of a gap between the apical membrane and the aECM, which we did not observe in the HPF/FS method (Chung and Andrew, 2014). Despite this, the TEM images still clearly reveal that Papss mutants show a much thinner and more electron-dense aECM compared to WT (Figure 2H, I), consistent to the condensed WGA, Dpy, and Pio signals in our confocal analyses. As the reviewer mentioned, we believe that the current TEM data are sufficient to support the conclusion of severe aECM disorganization and Golgi defects in Papss mutants.

      (4) The authors may consider citing some of the work that has been done on sulfation in nematodes, e.g. as reviewed here: https://pubmed.ncbi.nlm.nih.gov/35223994/ Sulfation has been tied to multiple aspects of nematode aECM organization, though not specifically to ZP proteins.

      Thank you for the suggestion. Pioneering studies in C. elegans have highlighted the key role of sulfation in diverse developmental processes, including neuronal organization, reproductive tissue development, and phenotypic plasticity. We have now cited several works.  

      Reviewer #2 (Significance):

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      My expertise: I am a developmental geneticist with interests in apical ECM

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this work Woodward et al focus on the apical extracellular matrix (aECM) in the tubular salivary gland (SG) of Drosophila. They provide new insights into the composition of this aECM, formed by ZP proteins, in particular Pio and Dumpy. They also describe the functional requirements of PAPSS, a critical enzyme involved in sulfation, in regulating the expansion of the lumen of the SG. A detailed cellular analysis of Papss mutants indicate defects in the apical membrane, the aECM and in Golgi organization. They also find that Papss control the proper organization of the Pio-Dpy matrix in the lumen. The work is well presented and the results are consistent.

      Main comments

      - This work provides a detailed description of the defects produced by the absence of Papss. In addition, it provides many interesting observations at the cellular and tissular level. However, this work lacks a clear connection between these observations and the role of sulfation. Thus, the mechanisms underlying the phenotypes observed are elusive. Efforts directed to strengthen this connection (ideally experimentally) would greatly increase the interest and relevance of this work.

      Thank you for this thoughtful comment. To directly test whether the phenotypes observed in Papss mutants are due to the loss of sulfation activity, we generated transgenic lines expressing catalytically inactive forms of Papss, UAS-PapssK193A, F593P, in which key residues in the APS kinase and ATP sulfurylase domains are mutated. Unlike WT UAS-Papss (both the Papss-PD or Papss-PE isoforms), the catalytically inactive UAS-Papssmut failed to rescue any of the phenotypes, including the thin lumen phenotype (Figure 1I-L), altered WGA signals (Figure I, I’) and the cell death phenotype (Figure 4D, E). These findings strongly support the conclusion that the enzymatic sulfation activity of Papss is essential for the developmental processes described in this study.  

      - A main issue that arises from this work is the role of Papss at the cellular level. The results presented convincingly indicate defects in Golgi organization in Papss mutants. Therefore, the defects observed could stem from general defects in the secretion pathway rather than from specific defects on sulfation. This could even underly general/catastrophic cellular defects and lead to cell death (as observed).

      This observation has different implications. Is this effect observed in SGs also observed in other cells in the embryo? If Papss has a general role in Golgi organization this would be expected, as Papss encodes the only PAPs synthatase in Drosophila.

      Can the authors test any other mutant that specifically affect Golgi organization and investigate whether this produces a similar phenotype to that of Papss?

      Thank you for the comment. To address whether the defects observed in Papss mutants stem from general disruption of the secretory pathway due to Golgi disorganization, we examined mutants of two key Golgi components: Grasp65 and GM130. 

      In Grasp65 mutants, we observed significant defects in SG lumen morpholgy, including highly irregular SG lumen shape and multiple constrictions (100%; n=10/10). However, the lumen was not uniformly thin as in Papss mutants. In contrast, GM130 mutants–although this line was very sick and difficult to grow–showed relatively normal salivary glands morphology in the few embryos that survived to stage 16 (n=5/5). It is possible that only embryos with mild phenotypes progressed to this stages, limiting interpretation. These data have now been included in Figure 3-figure supplement 2. Overall, while Golgi disruption can affect SG morphology, the specific phenotypes seen in Papss mutants are not fully recapitulated by Grasp65 or GM130 loss. 

      - A model that conveys the different observations and that proposes a function for Papss in sulfation and Golgi organization (independent or interdependent?) would help to better present the proposed conclusions. In particular, the paper would be more informative if it proposed a mechanism or hypothesis of how sulfation affects SG lumen expansion. Is sulfation regulating a factor that in turn regulates Pio-Dpy matrix? Is it regulating Pio-Dpy directly? Is it regulating a

      product recognized by WGA?

      For instance, investigating Alcian blue or sulfotyrosine staining in pio, dpy mutants could help to understand whether Pio, Dpy are targets of sulfation.

      Thank you for the comment. We’re also very interested in learning whether the regulation of the Pio-Dpy matrix is a direct or indirect consequence of the loss of sulfation on these proteins. One possible scenario is that sulfation directly regulates the Pio-Dpy matrix by regulating protein stability through the formation of disulfide bonds between the conserved Cys residues responsible for ZP module polymerization. Additionally, the Dpy protein contains hundreds of EGF modules that are highly susceptible to O-glycosylation. Sulfation of the glycan groups attached to Dpy may be critical for its ability to form a filamentous structure. Without sulfation, the glycan groups on Dpy may not interact properly with the surrounding materials in the lumen, resulting in an aggregated and condensed structure. These possibilities are discussed in the Discussion.

      We have not analyzed sulfation levels in pio or dpy mutants because sulfation levels in mutants of single ZP domain proteins may not provide much information. A substantial number of proteoglycans, glycoproteins, and proteins (with up to 1% of all tyrosine residues in an organism’s proteins estimated to be sulfated) are modified by sulfation, so changes in sulfation levels in a single mutant may be subtle. Especially, the existing dpy mutant line is an insertion mutant of a transposable element; therefore, the sulfation sites would still remain in this mutant. 

      - Interpretation of Papss effects on Pio and Dpy would be desired. The results presented indicate loss of Pio antibody staining but normal presence of cherry-Pio. This is difficult to interpret. How are these results of Pio antibody and cherry-Pio correlating with the results in the trachea described recently (Drees et al. 2023)?

      In our original submission, we stated that the uniform luminal mCh-Pio signals were not changed in Papss mutants, but after re-analysis, we found that these signals were actually absent from the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      After cleavages by Np and furin, the Pio protein should have three fragments. The Nterminal region contains the N-terminal half of the ZP domain, and mCh-Pio signals show this fragment. The very C-terminal region should localize to the membrane as it contains the transmembrane domain. We think the middle piece, the C-terminal ZP domain, is recognized by the Pio antibody. The mCh-Pio and Pio antibody signals in the WT trachea (Drees et al., 2023) are similar to those in the SG. mCh-Pio signals are detected in the tracheal lumen as uniform signals, at the apical membrane, and in cytoplasmic puncta. Pio antibody signals are exclusively in the tracheal lumen and show more heterogenous filamentous signals. 

      In Papss mutants, the middle fragment (the C-terminal ZP domain) seems to be most affected because the Pio antibody signals are absent from the lumen. The loss of Pio antibody signals could be due to protein degradation or epitope masking caused by aECM condensation and protein misfolding. This fragment seems to be key for interacting with Dpy, since Pio antibody signals always colocalize with Dpy-YFP. The N-terminal mCh-Pio fragment does not appear to play a significant role in forming a complex with Dpy in WT (but still aggregated together in Papss mutants), and this can be tested in future studies.

      In response to Reviewer 1’s comment, we performed an additional experiment to test the role of Np in cleaving Pio to help organize the SG aECM. In this experiment, we overexpressed the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. Np.WT overexpression resulted in increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. However, overexpression of Np.S990A resulted in the absence of luminal mCh-Pio signals. Pio antibody signals were strong at the apical membrane but rather weak in the luminal filamentous structures. Since the UAS-Np.S990A line has the GFP tag, we could not reliably analyze Dpy-YFP signals due to overlapping Np.S990A.GFP signals in the same channel. However, the luminal filamentous Pio signals co-localized with GFP signals, and we assume that these overlapping signals could be Dpy-YFP signals. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I. 

      A proposed model of the Pio-Dpy aECM in WT, Papss, pio, and Np mutants has now been included in Figure 7.

      -  What does the WGA staining in the lumen reveal? This staining seems to be affected differently in pio and dpy mutants: in pio mutants it disappears from the lumen (as dpy-YFP does), but in dpy mutants it seems to be maintained. How do the authors interpret these findings? How does the WGA matrix relate to sulfated products (using Alcian blue or sulfotyrosine)?

      WGA binds to sialic acid and N-acetylglucosamine (GlcNAc) residues on glycoproteins and glycolipids. GlcNAc is a key component of the glycosaminoglycan (GAG) chains that are covalently attached to the core protein of a proteoglycan, which is abundant in the ECM. We think WGA detects GlcNAc residues in the components of the aECM, including Dpy as a core component, based on the following data. 1) WGA and Dpy colocalize in the lumen, both in WT (as thin filamentous structures) and Papss mutant background (as condensed rod-like structures), and 2) are absent in pio mutants. WGA signals are still present in a highly condensed form in dpy mutants. That’s probably because the dpy mutant allele (dpyov1) has an insertion of a transposable element (blood element) into intron 11 and this insertion may have caused the Dpy protein to misfold and condense. We added the information about the dpy allele to the Results section and discussed it in the Discussion.

      Minor points:

      - The morphological phenotypic analysis of Papss mutants (homozygous and transheterozygous) is a bit confusing. The general defects are higher in Papss homozygous than in transheterozygotes over a deficiency. Maybe quantifying the defects in the heterozygote embryos in the Papss mutant collection could help to figure out whether these defects relate to Papss mutation.

      We analyzed the morphology of heterozygous Papss mutant embryos. They were all normal. The data and quantifications have now been added to Figure 1-figure supplement 3. 

      - The conclusion that the apical membrane is affected in Papss mutants is not strongly supported by the results presented with the pattern of Crb (Fig 2). Further evidences should be provided. Maybe the TEM analysis could help to support this conclusion

      We quantified Crb levels in the sub-apical and medial regions of the cell and included this new quantification in Figure 2D. TEM images showed variation in the irregularity of the apical membrane, even in WT, and we could not draw a solid conclusion from these images.

      - It is difficult to understand why in Papss mutants the levels of WGA increase. Can the authors elaborate on this?

      We think that when Dpy (and many other aECM components) are condensed and aggregated into the thin, rod-like structure in Papss mutants, the sugar residues attached to them must also be concentrated and shown as increased WGA signals.   

      - The explanation about why Pio antibody and mcherry-Pio show different patterns is not clear. If the antibody recognizes the C-t region, shouldn't it be clearly found at the membrane rather than the lumen?

      The Pio protein is also cleaved by furin protease (Figure 5B). We think the Pio fragment recognized by the antibody should be a “C-terminal ZP domain”, which is a middle piece after furin + Np cleavages. 

      - The qsm information does not seem to provide any relevant information to the aECM, or sulfation.

      Since Qsm has been shown to bind to Dpy and remodel Dpy filaments in the muscle tendon (Chu and Hayashi, 2021), we believe that the different behavior of Qsm in the SG is still informative. As mentioned briefly in the Discussion, the cleaved Qsm fragment may localize differently, like Pio, and future work will need to test this. We have shortened the description of the Qsm localization in the manuscript and moved the details to the figure legend of Figure 5-figure supplement 3.

      Reviewer #3 (Significance):

      Previous reports already indicated a role for Papss in sulfation in SG (Zhu et al 2005). Now this work provides a more detailed description of the defects produced by the absence of Papss. In addition, it provides relevant data related to the nature and requirements of the aECM in the SG. Understanding the composition and requirements of aECM during organ formation is an important question. Therefore, this work may be relevant in the fields of cell biology and morphogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors identify an insect salivary protein participating viral initiate infection in plant host. They found a salivary LssaCA promoting RSV infection by interacting with OsTLP that could degrade callose in plants. Furthermore, RSV NP bond to LssaCA in salivary glands to form a complex, which then bond to OsTLP to promote degradation of callose.

      The story focus on tripartite virus-insect vector-plant interaction and is interesting. However, the study is too simple and poor-conducted. The conclusion is also overstated due to unsolid findings.

      We thank the reviewer for their constructive feedback. We have conducted additional experiments to strengthen our results and conclusions as detailed below:

      (1) The comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. The differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) We conducted new experiments to assess the function of LssaCA enzymatic activity in mediating RSV infection. Additional experiments revealed that OsTLP enzymatic activity is highly pH-dependent, with increased activity as pH decreases from 7.5 to 5.0 (Fig. 3H). However, the LssaCA-OsTLP interaction at pH 7.4 significantly enhanced OsTLP enzymatic activity without requiring pH changes. These results demonstrate that LssaCA-OsTLP protein interactions are crucial for mediating RSV infection. In contrast to pH-dependent mechanisms, our study demonstrated that LssaCA's biological function in mediating RSV infection is at least partially, if not completely, independent of its enzymatic activity. We have added these new resulted into the revised manuscript (Lines 220-227). We have also added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023 doi.org/10.1073/pnas.2222040120) with our findings in the revised manuscript (Lines 350-371).

      (3) We have repeated majority of callose deposition experiments, providing clearer images (Figures 5-6). In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, I, 6A, C and S8A). We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-G and S8B-E). For sieve plate visualization, we examined longitudinal sections, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (4) We generated OsTLP mutant rice seedlings (ostlp) and use this mutant to directly demonstrate that LssaCA mediates callose degradation in planta through enhancement of OsTLP enzymatic activity (Lines 288-302 and Figure 6).

      (5) We produced LssaCA recombinant proteins in sf9 cells to ensure full enzymatic activity and constructed a comprehensive CA mutant protein, in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>,LssaCA<sup>N139H</sup>,LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C).

      Major comments:

      (1) The key problem is that how long the LssCA functioned for in rice plant. Author declared that LssCA had no effect on viral initial infection, but on infection after viral inoculation. It is unreasonable to conclude that LssCA promoted viral infection based on the data that insect inoculated plant just for 2 days, but viral titer could be increased at 14 days post-feeding. How could saliva proteins, which reached phloem 12-14 days before, induce enough TLP to degrade callose to promote virus infection? It was unbelievable.

      We appreciate your insightful comment and acknowledge that our initial description may have been unclear. We agree that salivary proteins would not present in plant tissues for two weeks post-feeding or post-injection. Our intention was to clarify that when salivary proteins enhance RSV infection, this initial enhancement leads to sustained high viral loads. We measured viral burden at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by qRT-PCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157).

      To determine the actual persistence of LssaCA in plant tissues, we conducted additional experiments where insects were allowed to feed on a defined aera of rice seedlings for two days. We then monitored LssaCA protein levels at 1 and 3 days after removing the insects. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 days post-feeding. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (2) Lines 110-116 and Fig. 1, the results of viruliferous insect feeding and microinjection with purified virus could not conclude the saliva factor necessary of RSV infection, because these two tests are not in parallel and comparable. Microinjection with salivary proteins combined with purified virus is comparable with microinjection with purified virus.

      We thank the reviewer’s insightful comment. We agree that “the results of viruliferous insect feeding and microinjection with the purified virus could not conclude the saliva factor necessary of RSV infection”. However, due to the technical difficulty in collecting sufficient quantities of salivary proteins to conduct the microinjection experiment, we have removed these results from the revised manuscript.

      (3) The second problem is how many days post viruliferous insect feeding and microinjection with purified virus did author detect viral titers? in Method section, authors declared that viral titers was detected at 7-14 days post microinjection. Please demonstrate the days exactly.

      We thank the reviewer’s insightful comment. We typically measured RSV infection levels at both 7- and 14-days post-microinjection. However, since the midrib microinjection experiments have been removed from the revised manuscript, this methodology has also been removed accordingly.

      (4) The last problem is that how author made sure that the viral titers in salivary glands of insects between two experiments was equal, causing different phenotype of rice plant. If not, different viral titers in salivary glands of insects between two experiments of course caused different phenotype of rice plant.

      We thank the reviewer’s comment. When we compared the effects of LssaCA deficiency on RSV infection of rice plants, we have compared the viral titers in the insect saliva and salivary glands. The results indicated that the virus titers in both tissues have not changed by LssaCA deficiency, suggesting that the viruses inoculated into rice phloem by insects of different treatments were comparable. Please refer to the revised manuscript Figures 2D-G and Lines 161-173.

      (5) The callose deposition in phloem can be induced by insect feeding. In Fig. 5H, why was the callose deposition increased in the whole vascular bundle, but not phloem? Could the transgenic rice plant directional express protein in the phloem? In Fig. 5, why was callose deposition detected at 24 h after insect feeding? In Fig. 6A, why was callose deposition decreased in the phloem, but not all the cells of the of TLP OE plant? Also in Fig.6A and B, expression of callose synthase genes was required.

      We thank the reviewer for these insightful comments.

      (1) Figure 5. The callose deposition increased in multiple cells within the vascular bundle, including sieve tubes, parenchymatic cells, and companion cells. While callose deposition was detected in other parts of the vascular bundle, no significant differences were observed between treatments in these regions, indicating that in response to RSV infection and other treatments, altered callose deposition mainly occurred in phloem cells. Please refer to the revised 5B, 5J, 6B, and 6D.

      (2) Transgenic plant expression. The OsTLP-overexpressing transgenic rice plants express TLP proteins in various cells under the control of CaMV 35S promoter, rather than being directionally expressed in the phloem. However, since TLP proteins are secreted, they are potentially transported and concentrated in the phloem where they can degrade callose.

      (3) Figure 5. The 24-hour time point for callose deposition detection was selected based on established protocols from previous studies. According to Hao et al. (Plant Physiology 2008), callose deposition increased during the first 3 days of planthopper infestation and decreased after 4 days. Additionally, Ellinger and Voigt (Ann Bot 2014) demonstrated that callose visualization typically begins 18-24 hours after treatment, making 24 hours an optimal detection time point.

      (4) Figure 6, Phloem-specific changes. Similar to Figure 5, while callose deposition was detected in other parts of vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that RSV infection specifically affects callose deposition in phloem tissue.

      (5) Callose synthase gene expression. We performed RT-qPCR analysis to measure the expression levels of callose synthase genes. The results indicated that OsTLP overexpression did not significantly alter the mRNA levels of these genes, regardless of RSV infection status in SBPH.

      Reviewer #2 (Public Review):

      There is increasing evidence that viruses manipulate vectors and hosts to facilitate transmission. For arthropods, saliva plays an essential role for successful feeding on a host and consequently for arthropod-borne viruses that are transmitted during arthropod feeding on new hosts. This is so because saliva constitutes the interaction interface between arthropod and host and contains many enzymes and effectors that allow feeding on a compatible host by neutralizing host defenses. Therefore, it is not surprising that viruses change saliva composition or use saliva proteins to provoke altered vector-host interactions that are favorable for virus transmission. However, detailed mechanistic analyses are scarce. Here, Zhao and coworkers study transmission of rice stripe virus (RSV) by the planthopper Laodelphax striatellus. RSV infects plants as well as the vector, accumulates in salivary glands and is injected together with saliva into a new host during vector feeding.

      The authors present evidence that a saliva-contained enzyme - carbonic anhydrase (CA) - might facilitate virus infection of rice by interfering with callose deposition, a plant defense response. In vitro pull-down experiments, yeast two hybrid assay and binding affinity assays show convincingly interaction between CA and a plant thaumatin-like protein (TLP) that degrades callose. Similar experiments show that CA and TLP interact with the RSV nuclear capsid protein NT to form a complex. Formation of the CA-TLP complex increases TLP activity by roughly 30% and integration of NT increases TLP activity further. This correlates with lower callose content in RSV-infected plants and higher virus titer. Further, silencing CA in vectors decreases virus titers in infected plants.

      (1) Interestingly, aphid CA was found to play a role in plant infection with two non-persistent non-circulative viruses, turnip mosaic virus and cucumber mosaic virus (Guo et al. 2023 doi.org/10.1073/pnas.2222040120), but the proposed mode of action is entirely different.

      We appreciate the reviewer’s insightful comment and have carefully examined the cited publication. The study by Guo et al. (2023) elucidates a distinct mechanism for aphid-mediated transmission of non-persistent, non-circulative viruses (turnip mosaic virus and cucumber mosaic virus). In their model, aphid-secreted CA-II in the plant cell apoplast leads to H<sup>+</sup> accumulation and localized acidification. This trigger enhanced vesicle trafficking as a plant defense response, inadvertently facilitating virus translocation from the endomembrane system to the apoplast.

      In contrast to these pH-dependent mechanisms, our study demonstrated that LssaCA’s biological function in mediating RSV infection is, if not completely, at least partially independent of its enzymatic activity. We performed additional experiments to reveal that OsTLP enzymatic activity is highly pH-dependent and exhibits increased enzymatic activity as pH decreases from 7.5 to 5.0 (Fig. 3H); however, the LssaCA-OsTLP interaction occurring at pH 7.4 significantly enhanced OsTLP enzymatic activity without any change in buffer pH (Fig. 3G). These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection.

      We have incorporated these new experimental results and added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023) with our findings in the revised manuscript. Please refer to Figures 3G-H, Lines 220-227 and 350-371 for detailed information.

      (2) While this is an interesting work, there are, in my opinion, some weak points. The microinjection experiments result in much lower virus accumulation in rice than infection by vector inoculation, so their interpretation is difficult.

      We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript. To address the core question raised by these experiments, we have conducted new experiments that directly demonstrate the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (3) Also, the effect of injected recombinant CA protein might fade over time because of degradation or dilution.

      We appreciate the reviewer’s insightful comment. This is indeed a valid concern that could affect the interpretation of microinjection results. To address the temporal dynamics of CA protein presence in planta, we conducted time-course experiments to monitor the retention of naturally SBPH-secreted CA proteins in rice plants. Our analysis at 1- and 3- days post-feeding (dpf) revealed that CA protein levels decreased progressively following SBPH feeding, but could also been detected at 3dpf (Fig. 2H). Please refer to Figures 2H and lines 184-193 for detailed information.

      (4) The authors claim that enzymatic activity of CA is not required for its proviral activity. However, this is difficult to assess because all CA mutants used for the corresponding experiments possess residual activity.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA protein microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (5) It remains also unclear whether viral infection deregulates CA expression in planthoppers and TLP expression in plants. However, increased CA and TLP levels could alone contribute to reduced callose deposition.

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands, which indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J). By using RSV-free and RSV-infected L.striatellus to feed on rice seedlings, we clarified that RSV infection does not affect TLP expression in plants (Figure 5H).

      Reviewer #1: (Recommendations For The Authors):

      Other comments:

      (1) Most data proving viral infection and LssaCA expression were derived from qPCR assays. Western blot data are strongly required to prove the change at the protein level.

      We agree that western blot data are required to prove the change at the protein level. In the revised manuscript, we have added western-blotting results (Figures 1F, 1I, 2C, 2J, and S6).

      (2) Line 145, data that LssaCA was significantly downregulated should be shown.

      Thank you and the data has been added to the revised manuscript. Please refer to Line 165 and Figure 2D.

      (3) Lines 159-161, how did authors assure that the dose of recombinant LssCA was closed to the release level of insect feeding, but not was excessive? How did author exclude the possibility of upregulated RSV titer caused by excessive recombinant LssCA?

      We appreciate this important concern regarding dosage controls. While microinjection of recombinant proteins typically yields viral infection levels significantly lower than those achieved through natural insect feeding, higher protein concentrations are often required to achieve high viral infection levels. In this experiment, we compared RSV infection levels following microinjection of BSA+RSV versus LssaCA+RSV, with the expectation that any observed upregulation in RSV titer would be specifically attributable to recombinant LssaCA rather than excessive protein dosing. However, given the low RSV infection levels observed with viral microinjection, we have removed their corresponding results from the revised manuscript.

      (4) Lines 124-125, recombinantly expressed LssaCA protein should be underlined, but not the LssaCA protein itself.

      We have clearly distinguished recombinantly expressed LssaCA from endogenous LssaCA protein throughout the manuscript, ensuring that all references to recombinant proteins are properly labeled as such.

      (5) LssaCA expression in salivary glands of viruliferous and nonviruliferous insects is required. LssaCA accumulation in rice plant exposed to viruliferous and nonviruliferous insects is also required.

      We have measured LssaCA mRNA levels in salivary glands of viruliferous and nonviruliferous insects (Figure 1J), and protein levels in rice plant exposed to viruliferous and nonviruliferous insects (Figure 1I).

      (6) Fig. 4G, the enzymatic activities of OsTLP were too low compared with that in Fig. 4E and Fig. 7E. Why did the enzymatic activities of the same protein show so obvious difference?

      We apologize for the error in Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure (revised Figure 3G) to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (7) Fig. 7E, was the LssaCA + NP and LssaCA + GST quantified?

      Yes, all proteins were quantified, and enzymatic activity values were calculated and expressed as units per milligram of proteins (units mg<sup>-1</sup>).

      Minor comments:

      (1) The keywords: In fact, the LssaCA functioned during initial viral infection in plant, but not viral horizontal transmission.

      We appreciate the reviewer’s insightful comment. We have revised the manuscript title to “Rice stripe virus utilizes an Laodelphax striatellus salivary carbonic anhydrase to facilitate plant infection by direct molecular interaction” and changed the keyword from “viral horizontal transmission” to “viral infection of plant”.

      (2) Fig. 2A, how about testes? Was this data derived from female insects? Fig. 2C, is the saliva collected from nonviruliferous insects? Fig. 2E, what is the control?

      We appreciate the reviewer’s insightful comments.

      (1) Fig. 2A: The data present mean and SD calculated from three independent experiments, with 5 tissue samples per experiment. Since 3<sup>rd</sup> instar nymphs were used for feeding experiments in this study, we also used 3<sup>rd</sup> instar RSV-free nymphs to measure gene expression in guts, salivary glands and fat bodies. R-body represents the remaining body after removing these tissues. Female insects were used to measure gene expression in ovaries, and gene expression in testes was also added. We have added this necessary information to the revised manuscript (please refer to new Figure 1F and Lines 402-403).

      (2) Fig. 2C: Yes, saliva was collected from nonviruliferous insects.

      (3) Fig. 2E: The control consisted of 100 mM PBS, as described in the experimental section (Lines 643-644): “A blank control consisted of 2 mL of 100 mM PBS (pH 7.0) mixed with 1 mL of 3 mM p-NPA.” In the revised manuscript, we recombinantly expressed LssaCA and its mutant proteins in both sf9 cells and E.coli. Therefore, we have used the mutant proteins as controls to demonstrate specific enzymatic activity. Please refer to Figure 1C, Lines 115-122 and 621-635 for detailed information.

      (3) Some figure labeling appeared unprofessional. For example, "a-RSV", "loading" in Fig. 1, "W-saliva", "G-saliva" in Fig. 2, and so on, the related explanations were absent.

      We appreciate the reviewer’s insightful comments. We have thoroughly reviewed all figures to ensure professional labels. Specifically, we have:

      (1) Used proper protein names to label western blots and clearly explained the antibodies used for protein detection.

      (2) Provided comprehensive explanations for all abbreviations used in figures within the corresponding figure legends.

      (3) Ensured consistent and clear labeling throughout all figures.

      Please refer to the revised Figures 1-3 for these corrections.

      (4) Lines 83-84, please cite references on callose preventing viral movement. I do not think the present references were relevant.

      We have added a more relevant reference (Yue et al., 2022, Line 82), which revealed that palmitoylated γb promotes virus cell-to-cell movement by interacting with NbREM1 to inhibit callose deposition at plasmodesmata.

      (5) The background of transgenic plants of OsTLP OE should be characterized. And the overexpression of OsTLP should be shown. Which generation of OsTLP OE did authors use?

      The background of transgenic plants of OsTLP OE and its generation used have been shown in the “Materials and methods” section (Line 782-786) and has been mentioned in the main text (Line 214). T<sup>2</sup> lines have been selected for further analysis (Line 789).

      (6) Fig. 5A, the blank, which derived from plants without exposure to insect, was absent.

      We appreciate the reviewer’s insightful comments. We have added the non- fed control in the revised Figure 5A-C.

      (7) Fig. 7A, the nonviruruliferous insects were required to serve as a control.

      Immunofluorescence localization of RSV and LssaCA in uninfected L. striatellus salivary glands have been added to the revised manuscript (Figure S2).

      (8) The manuscript needs English language edit.

      The manuscript has undergone comprehensive English language editing to improve clarity, grammar, and overall readability.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first experiment compares vector inoculation vs microinjection of RSV in tissue. I am not sure that your claim (saliva factors are necessary for inoculation) holds, because the vector injects RSV directly into the phloem, whereas microinjection is less precise and you cannot control where exactly the virus is deposed. However, virus deposited in other tissues than the phloem might not replicate, and indeed you observe, compared to natural vector inoculation, highly reduced virus titers.

      We appreciate the reviewer’s insightful comments. We agree that the comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. As the reviewer correctly points out, the differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) Next the authors show that a carbonic anhydrase (CA) that they previously detected in saliva is functional and secreted into rice. I assume this is done with non-infected insects, but I did not find the information. Silencing the CA reduces virus titers in inoculated plants at 14 dpi, but not in infected planthoppers. At 1 dpi, there is no difference in RSV titer in plants inoculated with CA silenced planthoppers or control hoppers. To see a direct effect of CA in virus infection, purified virus is injected together with a control protein or recombinant CA into plants. At 14 dpi, there is about double as much virus in the CA-injected plants, but compared to authentic SBPH inoculation, titers are 20,000 times lower. Actually, I believe it is not very likely that the recombinant CA is active or present so long after initial injection.

      We appreciate the reviewer’s insightful comments.

      (1) Our previous study identified the CA proteins from RSV-free insects. We have added this information to the revised manuscript (Line 110).

      (2) We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (3) We didn’t intend to suggest that LssaCA proteins presented for 14 days post-injection. We measured viral titers at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by RT-qPCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157). To determine the actual persistence of LssaCA in plant tissues, we monitored LssaCA protein levels at 1 and 3 dpf. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 dpf. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (3) Then the authors want to know whether CA activity is required for its proviral action and single amino acid mutants covering the putative active CA site are created. The recombinant mutant proteins have 30-70 % reduced activity, but none of them has zero activity. When microinjected together with RSV into plants, RSV replication is similar as injection with wild type CA. Since no knock-out mutant with zero activity is used, it is difficult to judge whether CA activity is unimportant for viral replication, as claim the authors.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA proteins microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (4) Next a yeast two hybrid assay reveals interaction with a thaumatin-like rice protein (TLP). It would be nice to know whether you detected other interacting proteins as well. The interaction is confirmed by pulldown and binding affinity assay using recombinant proteins. The kD is in favor of a rather weak interaction between the two proteins.

      We have added a list of rice proteins that potentially interact with LssaCA (Table S1) and have measured interactions with additional proteins (unpublished data). Despite the relatively weak binding affinity, the functional significance of the LssaCA-OsTLP interaction in enhancing TLP enzymatic activity is substantial.

      (5) Then the glucanase activity of TLP is measured using recombinant TLP-MBP or in vivo expressed TLP. It is not clear to me which TLP is used in Fig. 4G (plant-expressed or bacteria-expressed). If it is plant-expressed TLP, why is its basic activity 10 times lower than in Fig. 4F?

      Fig. 4G is the Fig. 3G in the revised manuscript. A E. coli-expressed TLP protein has been used. We apologize for the error in our original Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (6) There is also a discrepancy in the construction of the transgenic rice plants: did you use TLP without signal peptide or full length TLP? If you used TLP without signal peptide, you should explain why, because the wild type TLP contains a signal peptide.

      We cloned the full-length OsTLP gene including the signal peptide sequence (Line 782 in the revised manuscript).

      (7) The authors find that CA increases glucanase activity of TLP. Next the authors test callose deposition by aniline blue staining. Feeding activity of RSV-infected planthoppers induces more callose deposition than does feeding by uninfected insects. In the image (Fig. 5A) I see blue stain all over the cell walls of xylem and phloem cells. Is this what the authors expect? I would have expected rather a patchy pattern of callose deposition on cell walls. Concerning sieve plates, I cannot discern any in the image; they are easier to visualize in longitudinal sections than in transversal section as presented here.

      We appreciate the reviewer’s insightful comment.

      (1) Callose deposition pattern: While callose deposition was detected in other parts of the vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that phloem-specific callose deposition is the primary response to RSV infection and SBPH feeding (Figures 5B and 5J).

      (2) Sieve plate visualization: We have examined longitudinal sections to visualize sieve plates, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (3) Quantitative analysis: In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, 5I and S8A).

      (4) Gene expression analysis: We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-H).

      These experimental results collectively demonstrate that RSV infection induces enhanced callose synthesis and deposition, with this response occurring primarily in phloem cells, including sieve plates, within feeding sites and their immediate vicinity.

      (8) I do not quite understand how you quantified callose deposition (arbitrary areas?) with ImageJ. Please indicate in detail the analysis method.

      We have added more detailed information for the methods to quantify callose deposition (Lines 673-678).

      (9) More callose content is also observed by a callose ELISA assay of tissue extracts and supported by increased expression of glucanase synthase genes. Did you look whether expression of TLP is changed by feeding activity and RSV infection? Silencing CA in planthoppers increases callose deposition, which is inline with the observation that CA increases TLP activity.

      We measured OsTLP expression following feeding by RSV-free or RSV-infected SBPH and found that gene expression was not significantly affected by either insect feeding or RSV infection. These results have been added to the revised manuscript (Lines 275-277 and Figure 5H).

      (10) Next, callose is measured after feeding of RSV-infected insects on wild type or TLP-overexpressing rice. Less callose deposition (after 2 days) and more virus (after 14 days) is observed in TLP overexpressors. I am missing a control in this experiment, that is feeding of uninfected insects on wild type or TLP overexpressing rice, where I would expect intermediate callose levels.

      We appreciate the reviewer’s insightful comment and fully agree with the prediction. In the revised manuscript, we have constructed ostlp mutant plants and conducted additional experiments to further clarify how callose deposition is regulated by insect feeding, RSV infection, LssaCA levels, and OsTLP expression. Specifically: 

      (1) Both SBPH feeding and RSV infection induce callose deposition, with RSV-infected insect feeding resulting in significantly higher callose levels compared to RSV-free insect feeding (Fig. 5A-C).

      (2) LssaCA enhances OsTLP enzymatic activity, thereby promoting callose degradation (Fig. 5I-K).

      (3) OsTLP-overexpressing (OE) plants exhibit lower callose levels than wild-type (WT) plants, while ostlp mutant plants show higher callose levels than WT (Fig. 6A-B).

      (4) In ostlp knockout plants, LssaCA no longer affects callose levels, indicating that OsTLP is required for LssaCA-mediated regulation of callose (Fig. 6C-D).

      These additional data address the reviewer’s concern and support the conclusion that OsTLP plays a central role in modulating callose levels in response to RSV infection and insect feeding.

      (11) Next the authors test for interaction between virions and CA. Immunofluorescence shows that RSV and CA colocalize in salivary glands; in my opinion, there is partial and not complete colocalization (Fig. 7A).

      We agree with the reviewer’s observation. CA is primarily produced in the small lobules of the principal salivary glands, while RSV infects nearly all parts of the salivary glands. In regions where RSV and CA colocalize within the principal glands, the CA signal appears sharper than that of RSV, likely due to the relatively higher abundance of CA compared to RSV in these areas. This may explain the partial, rather than complete, colocalization observed in our original Figure 7A. In the revised manuscript, please refer to Figure 1A.

      (12) Pulldown experiments with recombinant RSV NP capsid protein and CA confirm interaction, binding affinity assays indicate rather weak interaction between CA and NP. Likewise in pull-down experiments, interaction between NP, CA and TLP is shown. Finally, in vitro activity assays show that activity of preformed TLP-CA complexes can be increased by adding NP; activity of TLP alone is not shown.

      We performed two independent experiments to confirm the influence on TLP enzymatic activity by LssaCA or by the LssaCA-RSV NP complex. In the first experiment, we compared the enhancement of TLP activity by LssaCA using TLP alone as a control (Figure 3G). In the second experiment examining the LssaCA-RSV NP complex effect on TLP activity, we used the LssaCA-TLP combination as the baseline control rather than TLP alone (Figure 4B), since we had already established the LssaCA enhancement effect in the previous experiment.

      (13) For all microscopic acquisitions, you should indicate the exact acquisition conditions, especially excitation and emission filter settings, kind of camera used and objectives. Use of inadequate filters or of a black & white camera could for example be the reason why you observe a homogeneous cell wall label in the aniline blue staining assays. Counterstaining cell walls with propidium iodide might help distinguish between cell wall and callose label.

      Thank you for your insightful suggestions. We have added the detailed information to the revised manuscript (Lines 656-659 and 673-678).

      (14) You should provide information whether CA is deregulated in infected planthoppers, as this could also modify its mode of action.\

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands. The results indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J).

      (15) You should show purity of the proteins used for affinity binding measurements.

      We have included SDS-PAGE results of purified proteins in the revised manuscript (Figure S3).

      (16) L 39: Not all arboviruses are inoculated into the phloem.

      Thank you. We have revised this description (Lines 40, 73, 95 and 97).

      (17) L 76: Watery saliva is also injected in epidermis and mesophyll cells.

      Thank you. We have revised this description (Line 73).

      (18) L 79: What do you mean by "avirulent gene"?

      Thank you for your valuable comments. We have revised this description as “certain salivary effectors may be recognized by plant resistance proteins to induce effector-triggered immunity”. Please refer to Lines 76-77 for detail.

      (19) L 128: Please add delivery method.

      Thank you. We have added the delivery methods (Line 134).

      (20) L 195: Please explain "MST".

      Explained (Line 124). Thank you.

      (21) L 203: Please add the plant species overexpressing TLP.

      Added (Line 214). Thank you.

      (22) L 213: Callose deposition has also a role against phloem-feeding insects.

      We appreciate the reviewer’s insight comment. We have added this information to the revised manuscript (Line 252).

      (23) L 626: What is a "mutein"?

      "mutein" is an abbreviation for mutant proteins. Since the recombinant protein microinjection experiments have been removed from the revised manuscript, the term “mutein” has also been removed. For all other instances, we now use the full term “mutant proteins”.

      (24) Fig. 1E: what is "loading"? You should rather show here and elsewhere (or add to supplement) complete protein gels and Western blot membranes and not only bands of interest.

      Thank you for your valuable suggestion. Although Figure 1E has been removed from the revised manuscript, we have carefully reviewed all figures to ensure that the term “loading” has been replaced with the specific protein names where appropriate.

      (25) Fig. 2C: Please indicate which is the blot and which is the silver stained gel and add mass markers in kDa to the silver stained gel.

      Thank you for your suggestion. We have revised figure to include labeled silver-stained gels with indicated molecular weight markers (Figure 1H in the revised manuscript).

    1. Author response:

      We sincerely thank the editors and the reviewers for their feedback in helping us improve this manuscript. During the time this work has been under review, 10x Genomics has updated the probe sequences of their gene panels. We therefore plan to update these findings as well as further expand to incorporate reviewer recommendations.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection, thus suggesting a remodeling of the A13 connectome. Whether this remodelling contributes to pro-locomotor effects of the photostimulation of the A13 region remains unknown as causality was not addressed.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients. The study also provides a description of the A13 region connectome pertaining to motor behaviors and how it changes after a dopaminergic lesion. Although there is no causal link between anatomical and behavioral data, it raises interesting questions for further studies.

      Thank you for the comments.

      Weaknesses:

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, some uncertainty remains regarding the phenotype of neurons underlying recovery of akinesia and improvement of bradykinesia.

      The primary objective was to focus on a population of neurons that could contribute to functional recovery, with a long-term translational focus in mind. We have followed up on this by creating a rat-based DBS model of stimulating the A13 region (Bisht et al 2025). We agree that the next steps are to genetically dissect the circuits, and we have made a start on this with our recent publication (Sharma et al 2024).

      Figure 4 is improved, but the results from the correlation analyses remain difficult to interpret, as they may reflect changes in various impaired brain regions independently of the A13 region. While the analysis offers a snapshot of correlated changes within the connectome, it does not identify which specific cell or axonal populations are actually increasing or decreasing. Although functional MRI connectome analyses are well-established, anatomical data seem less suitable for this purpose. How can one interpret correlated changes in anatomical inputs or outputs between two distinct regions?

      We appreciate the reviewer's thoughtful comment regarding the interpretability of the correlation analyses in Figure 4. We fully acknowledge that our anatomical data cannot establish causality or identify specific cell types or axonal populations undergoing changes following unilateral nigrostriatal degeneration. However, our intent with this analysis was not to infer mechanistic pathways but rather to provide a systems-level overview of how the global organization of A13 efferents and afferents is altered following 6-OHDA lesioning. By calculating proportions of total inputs and outputs and comparing them across brain regions, we aimed to control for variability in labeling and highlight relative shifts in network organization. The correlation matrices are intended to capture coordinated changes in input/output distribution patterns, effectively reflecting how groups of regions co-vary in their input to or output from the A13 region. In our case, we used correlation analysis to identify how input and output distributions across brain regions reorganize as a network following 6-OHDA lesioning. For example, a positive correlation between inputs from Region A and Region B to the A13 suggests that across animals, when input from Region A is relatively high, input from Region B tends to be high as well, indicating that connectivity from these regions to the A13 may be co-regulated or affected similarly by the lesion. Conversely, a shift from positive to negative correlation may signal a divergence in how regions contribute to the A13 connectome after nigrostriatal degeneration (e.g., increased connectivity to Region A compared to reduced connectivity to Region B). Thus, these patterns offer new insight into the broader reorganization of the A13 connectome and may serve as systems-level signatures of altered anatomical organization, providing a foundation for future mechanistic investigations using circuit-specific tools. We have revised the text to better emphasize the correlative and descriptive nature of these analyses and to clarify that they serve as a hypothesis-generating exploration. Future studies using cell type- and/or projection-specific functional manipulations will be essential to determine the causal roles of these reorganized circuits. We believe our use of this method is justified in the context of exploring broad, lesion-induced network reorganization, and we hope this additional context helps clarify the purpose and limitations of our approach.

      Figure 5 is also improved, but there is room for further enhancement. As currently presented, it is difficult to distinguish the differences between the sham and 6-OHDA groups. The first column could compare afferents, while the second column could compare efferents. Given the small sample size, it would be more appropriate to present individual data rather than the mean and standard deviation.

      We have reorganized Figure 5 as suggested.

      Appraisal and impact

      Although the behavioral experiments are convincing, the low number of animals in the anatomical studies is insufficient to make any relevant statistical conclusions due to extremely low statistical power.

      See previous comments on this.

      Reviewer #2 (Recommendations for the authors):

      Points that need to be addressed:

      Figure S1 is supposed to illustrate the percentage of expression in all mice, but the number of mice does not match (n=3 and 3 in Figure S1 versus n=5 and 6 in Figure 1). Revise the legend or add the missing data.

      We have added the additional data to this graph (Figure 2 – figure supplement 1) and have separated out 6-OHDA and sham mice for clarity.

      Page 4: "There was also an increase in the number of ChR2 cells with c-fos labeling in 6-OHDA ChR2 mice compared to the 6-OHDA eYFP mice. However, there was no net increase in TH+ cells labelled with ChR2 and c-Fos suggesting a heterogeneous population of activated cells." A quantification will be necessary to advance this conclusion.

      We were able to determine that there was a trend of increased c-Fos intensity within the A13 region following photostimulation. However, the variability in the data makes it premature to comment on the TH co-localization and we have deleted this statement.

      Figure 3: The choice of red and green could be a problem for color-blind people.

      Thank you - switched to orange and cyan instead.

      Page 7, 4th paragraph: "6-OHDA mice demonstrated significantly greater descent times than sham mice (Figure 3L, p<0.01)." This is not what is shown in the Figure 3L.

      We made changes in the legend and text to clarify.

      Page 7, last line: PT abbreviation should be introduced in parentheses at the beginning of this section.

      Removed the abbreviation.

      Figure S4A: The authors should show data for the VTA or refer to the quantification of Figure S4G in the text.

      Now referenced correctly in the text.

      Figure S7 and S8 are not referenced in the results or methods.

      References added to text.

      Double-check the formatting of some references: L.-X. Li et al, 2021, L. Kim et al., 2021.

      References checked and corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain and report that ORF1p is expressed in the human and mouse brain specifically in neurons at steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is important to document and will be of value to the field. 

      Strengths: 

      Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the documentation of neuron-specific expression of ORF1p in the mouse brain is an interesting finding with nice documentation. This will be very useful information for the field. 

      We thank the reviewer for this positive comment. 

      Weaknesses: 

      Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments: 

      (1) The expression of ORF1p in the human brain shown in Fig. 1j is puzzling. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not non-specific labelling? While the authors discuss that others have found double bands when examining human ORF1p, there are also several labs that report only one band. This discrepancy in the field should at least be discussed and the uncertainties with their findings should be acknowledged. 

      Please see also our extensive response to this comment we made in round #1 of the revisions.

      As a summary, in response to the initial review, we included several lines of additional evidence in the revised manuscript:

      siRNA-mediated knockdown of ORF1p in human neurons, resulting in ≈50% signal reduction using the antibody in question (Suppl. Fig. 2C) immunoprecipitation using the human ORF1p antibody in question confirming signal specificity (Suppl. Fig. 2B) use of a second antibody in immunostainings, including a new control (Suppl. Fig. 2E) and a revised discussion acknowledging the uncertainty surrounding the lower band:

      “The double band pattern in Western blots has been observed in other studies for human ORF1p outside of the brain as well as for mouse ORF1p. […] The nature of the lower band is unknown, but might be due to truncation, specific proteolysis or degradation.”

      We have also now added more content to the paragraph starting from line 183 : "While there is some discrepancy in the field, the double band pattern in Western blots..."

      To our understanding, this combination of independent methods using two antibodies and complementary validation strategies supports the presence of ORF1p in human brain tissue.

      (2) The data showing a reduction in ORF1p expression in the aged mouse brain is an interesting observation, but the effect magnitude of effect is very limited and somewhat difficult to interpret. This finding should be supported by orthogonal methods to strengthen this conclusion. For example, by WB and by RNA-seq (to verify that the increase in protein is due to an increase in transcription). 

      This would indeed be valuable but at this point, we will not be able to perform these experiments at this point (please also see revision #1 for a more detailed answer)

      (3) The transcriptomic data using human postmortem tissue presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). As presented, the human RNA data is inconclusive due to the short read length and small sample size. The value of including an inconclusive analysis in the manuscript is difficult to understand. With this data set, the authors cannot investigate age-related changes in L1 expression in human neurons. 

      Please see also our extensive response to this comment we made in round #1 of the revisions.

      In the revised version, we have added further statistical analyses, incorporated locus-specific mappability scores and provided an even more nuanced interpretation of our findings, as illustrated in lines 390 and 427.

      We have acknowledged the limitations of short-read sequencing in this context, while referencing established methodologies (e.g., Teissandier et al., 2019) and recent benchmarking studies (e.g., Schwarz et al., 2022) that validate the use of such data under specific precautions—many of which we have implemented.

      Given these considerations, and with the guidance of a co-author with specific expertise in TE bioinformatics, we believe our approach is justified and robust.

      (4) In line with these comments, the title should be changed to better reflect the findings in the manuscript. A title that does not mention "L1 increase with aging" would be better. 

      In line with our response to Point (3), we prefer to retain the current analyses and discussion, which we believe strike an appropriate balance between caution and added scientific value.

      Reviewer #2 (Public review): 

      Summary: 

      Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS. 

      Strengths: 

      A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in brain is also a strength in that it provides a novel dataset for future studies. 

      We thank the reviewer for these positive comments.

      Weaknesses: 

      The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue.

      As we had stated in the first round of revision, the list of previously published datasets does include a mouse dataset with ORF1p interacting proteins in mouse spermatocytes (please see line 478-4479: “ORF1p interactors found in mouse spermatocytes were also present in our analysis including CNOT10, CNOT11, PRKRA and FXR2 among others (Suppl_Table4).”) -> De Luca, C., Gupta, A. & Bortvin, A. Retrotransposon LINE-1 bodies in the cytoplasm of piRNA-deficient mouse spermatocytes: Ribonucleoproteins overcoming the integrated stress response. PLoS Genet 19, e1010797 (2023)). We agree that a validation of protein interactors of ORF1p in the mouse brain would have been valuable. However, the significant overlap with previously published interactors highlights the validity of our data. As reviewer #2 points out in the comments on revisions, we hope that follow-up studies will address these points and we anticipate that this list of ORF1p protein interactors in the mouse brain will be of further use for the community.

      Comments on revisions: 

      The co-staining of Orf1p with Parvalbumin (PV) presented in Supplemental Figure S5 is a welcome addition exploring the cell type-specificity of Orf1p staining, and broadly corroborates the work of Bodea et al. while revealing that Orf1p also is expressed in non-PV+ cells, consistent with L1 activity across a range of neuronal subtypes. The authors also have strengthened their findings regarding the increased intensity of ORF1p staining in aged compared to young animals, and the newly presented results are indeed more convincing. The prospect of increased neuronal L1 activity with age is exciting, and the results in this paper have provided the groundwork for ongoing discoveries in this area. While it is disappointing that no Orf1p interactors were followed up, this is understandable and the data are nonetheless valuable and will likely prove useful to future studies. 

      Thank you for your time and constructive comments.

      Reviewer #1 (Recommendations for the authors): 

      We would recommend that the human RNA-seq analysis is removed from the manuscript. The human RNA data is inconclusive due to the short read length and small sample size. The value of including an inconclusive analysis in the manuscript is difficult to understand. With this data set, the authors cannot investigate age-related changes in L1 expression in human neurons. 

      Reviewer #2 (Recommendations for the authors): 

      Thank you for addressing my suggestions. I have no further recommendations at this time.

    1. Author response:

      Reviewer #1 (Recommendations for the authors):

      “The gar-3 promoter expression pattern was not discussed in the context of rescue experiments.”

      We agree that the expression pattern of the gar-3 promoter used in our rescue experiments should be clarified. We will include a description of the tissues where the 7.5 kb gar-3 promoter fragment is expressed, based on both prior studies and our own expression data. We will also discuss how the gar-3 cell and tissue expression pattern relates to both our analysis of gar-3 expression in the genome edited strain we generated as well as the observed rescue effects.

      Reviewer #2 (Recommendations for the authors):

      (1) The site of action of cholinergic signaling was not adequately explored.

      We plan to perform additional rescue experiments using heterologous promoters to drive gar-3 expression in specific tissues (e.g. cholinergic neurons, muscle). These experiments will help clarify the sufficiency of unc-17 expression in specific cell types for rescue. However, we point out that cell-specific unc-17 knockdown by RNAi using the unc-17b promoter (expression largely restricted to ventral cord ACh motor neurons) increases sensitivity to PQ in our long-term survival assays. Combined with our analysis of unc-17(e113) mutants, we believe our data offer robust support of a requirement for unc-17 expression in cholinergic motor neurons.

      (2) Pan-neuronal silencing experiments were not connected to ACh/GAR-3 signaling.

      We will expand our discussion to relate the pan-neuronal silencing results to our analysis of ACh signaling. We used the pan-neuronal silencing to motivate further analysis of various neurotransmitter systems. We note that our studies implicate both glutamatergic and cholinergic systems in protective responses to oxidative stress. The effects of silencing on survival during long-term PQ exposure may therefore be derived solely from cholinergic neurons, glutamatergic neurons, or a combination of both neuronal populations. We hope the reviewer will agree that distinguishing between these possibilities may be quite complicated and is not central to the main message of our paper. We therefore suggest this additional analysis lies outside the scope of this revision.

      (3) Inter-tissue signaling and transcriptional regulation by ACh were assumed but not directly shown.

      We will generate GFP reporters for a subset of genes (including proteasomal genes) identified in our RNA-seq analysis or assess their expression by quantitative RT-PCR to validate cholinergic regulation. These experiments will help to identify target tissues and confirm transcriptional regulation by cholinergic signaling.

      We appreciate the opportunity to revise our manuscript and believe that these additions will significantly strengthen the mechanistic insights and overall impact of our study. Please let us know if further clarification is needed.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary and Strengths:

      The very well-written manuscript by Lövestam et al. from the Scheres/Goedert groups entitled "Twelve phosphomimetic mutations induce the assembly of recombinant fulllength human tau into paired helical filaments" demonstrates the in vitro production of the so-called paired helical filament Alzheimer's disease (AD) polymorph fold of tau amyloids through the introduction of 12 point mutations that attempt to mimic the disease-associated hyper-phosphorylation of tau. The presented work is very important because it enables disease-related scientific work, including seeded amyloid replication in cells, to be performed in vitro using recombinant-expressed tau protein. 

      Weaknesses: 

      The following points are asked to be addressed by the authors:

      (i) In the discussion it would be helpful to note the findings that in AD the chemical structure tau (including phosphorylation) is what defines the polymorph fold and not the buffer/cellular environment. It would be further interesting to discuss these findings in respect to the relationship between disease and structure. The presented findings suggest that due to a cellular/organismal alteration, such as aging or Abeta aggregation, tau is specifically hyper-phosphorylated which then leads to its aggregation into the paired helical filaments that are associated with AD. 

      We have added an extra sentence to the Introduction to emphasise this possibility: “Besides the cellular environment in which they assemble, different tau folds may also be determined by chemical modifications of tau itself.”

      In addition, the last paragraph of the Discussion now reads: “It could be that, besides different cellular environments in which the filaments assemble, different posttranslational modification patterns are also important for the assembly of tau into protofilament folds that are specific for the other tauopathies.”

      (ii) The conditions used for each assembly reaction are a bit hard to keep track of and somewhat ambiguous. In order to help the reader, I would suggest making a table to show conditions used for each type of assembly (including the diameter / throw of the orbital shaker) and the results (structural/biological) of those conditions. For example, presumably the authors did not have ThT in the samples used for cryo-EM but the methods section does not specify this. Also, the presence of trace NaCl is proposed as a possible cause for the CTE fold to appear in the 0N4R sample (page 4) but no explanation of why this particular sample would have more NaCl than the others. Furthermore, it appears that NaCl was actually used in the seeded assembly reactions that produced the PHF and not the CTE fold. This would seem to indicate the CTE structure of 0N4RPAD12 is not actually induced by NaCl (like it was for tau297-391). In order for the reader to better understand the reproducibility of the polymorphs, it would be helpful to indicate in how many different conditions and how many replicates with new protein preparations each polymorph was observed (could be included in the same table)  

      We have added a new table (Table 1) with the buffer conditions, protein concentration and shaking speed and time, for all structures described in this paper. We never added ThT to assembly reactions that were used for cryo-EM.

      We did not use NaCl in the seeded assembly reactions (we used sodium citrate). We don’t really know why 0N4R PAD12 tau more readily forms the CTE fold. The observation that it does so prompted us to use 0N3R for all ensuing experiments. 

      (iii) It is not clear how the authors calculate the percentage of each filament type. In Figure 1 it is stated "discarded solved particles (coloured) and discarded filaments in grey" which leaves the reviewer wondering what a "discarded solved particle" is and which filaments were discarded. From the main text one guesses that the latter is probably false positives from automated picking but if so, these should not be referred to as filaments. Also, are the percentages calculated for filaments or segments? In any case, it would be more helpful in such are report to know the best estimate of the ratio of identified filament types without confusing the reader with a measure of the quality of the picking algorithm. Please clarify. Also, a clarification is asked for the significance of the varying degrees of PHF and AD monomer filaments in the various assembly conditions. It could be expected that there is significant variability from sample to sample but it would be interesting to know if there has been any attempt to reproduce the samples to measure this variability. If not, it might be worth mentioning so that the % values are taking with the appropriate sized grain of salt. Finally, the representation of the data in Figure 1 would seem to imply that the 0N3R forms less or no monofilament AD fold because no cross-section is shown for this structure, however it is very similar to (or statistically the same as) the 1:1 mix of 0N3R:0N4R.

      In the revised manuscript, we have used bi-hierchical clustering of filaments, where each segment (or particle) is classified based on both 2D class assignment and to which filament it belongs (this method is based on [Porthula et al (2019), Ultramicroscopy 203, 132-138] and was further developed in [Lövestam et al (2024) Nature 7993, 119-125]. Based on the assumption that filament type does not change within a single filament type, we have observed that this gives excellent classification results, and that this approach allows classification of many, even small minority, filament types. Using this approach, we now quantify the different filament types on the number of segments extracted from filaments classified in this way. 

      Moreover, we have also addressed the problem of having singlets among the PHF preparation: it turns out that waiting longer, just by transferring samples out of the shaker after one week and incubating it quiescently at 37 ºC for two more weeks, the singlets disappear and only PHFs remain. Filaments made for the fluorophore labelling in the revised Figure 3 were also done using the new protocol. In total, we have N=7 replicates with a mean of 95.3% PHFs and a standard deviation of 9.4%. The revised text in the Results section reads:

      “To further increase the proportions of PHFs-to-singlet ratio, we removed the plate from the shaker after one week and incubated it quiescently at 37 ºC for two more weeks. This resulted in 100% PHFs formed (Figure 1 – figure supplement 4). When repeated seven times, on average 95.3% PHFs formed, with 25% of singlets formed in a single outlier (Figure 1 – figure supplement 5)” 

      (iv) The interpretation of the NMR data on soluble tau that the mutations on the second site are suppressing in part long range dynamic interaction around the aggregationinitiation site (FIA) is sound. It is in particular interesting to find that the mutations have a similar effect as the truncation at residue 391. An additional experiment using solvent PREs to elaborate on the solvent exposed sequence-resolved electrostatic potential and the intra-molecular long range interactions would likely strengthen the interpretation significantly (Iwahara, for example, Yu et al, in JACS 2024). Figure 6D Figure supplement shows the NMR cross peak intensities between tau 151-391 and PAD12tau151-391. Overall the intensities of the PAD12 tau construct are more intense which could be interpreted with less conformational exchange between long range dynamic interactions. There are however several regions which do not show any intensity anymore when compared with the corresponding wildtype construct such as 259-262, 292-294 which should be discussed/explained. 

      While long-range intramolecular interactions of tau have previously been reported through the use of spin labels (Mukrasch et al 2009 PLoS Biol 7(2): e1000034), we have been hesitant to introduce paramagnetic agents into our samples for two reasons. First, the bulky size of the spin label may affect filament formation or influence the dynamic properties of the protein. Second, covalent addition of the spin label requires mutation of the primary sequence to both remove native cysteine residues and add cysteines at the desired label location. We have previously shown that mutation of cysteine 322 to alanine leads to the formation of tau filaments with a structure that is different from the PHF (Santambrogio et al (2025) bioRxiv 2025.03.29.646137). 

      Instead, we have included in the revised manuscript new NMR and cryo-EM data that provide further support for the model that a FIA-like interaction between residues <sub>392</sub>IVYK<sub>395</sub> and residues <sub>306</sub>VQIVYK<sub>311</sub> has an inhibiting effect on filament nucleation in unmodified full-length tau. A mutant of tau297-441 where residues <sub>392</sub>IVYK<sub>395</sub> have been deleted and that does not contain the four PAD12 mutations in the carboxy-terminal domain behaves similarly in the NMR experiment as the tau297-441 construct with those four PAD12 mutations. Moreover, full-length 0N3R tau with the eight PAD12 mutations in the amino-terminal fuzzy coat and with the deletion of<sub>392</sub>IVYK<sub>395</sub>, but without the four PAD12 mutations in the carboxy-terminal domain, assembles readily into amyloid filaments (of which we also solved a cryo-EM structure, see the revised Figure 6B). These observations provide mechanistic insights into the previously proposed paper-clip model [Jeganathan (2008), J Biol Chem 283, 32066-32076], where interactions between the fuzzy coat inhibit filament formation of unmodified full-length tau, and phosphorylation in the fuzzy coat interferes with these interactions, thus leading to filament nucleation. Of course, the identification of residues <sub>392</sub>IVYK<sub>395</sub> for this interaction also explain why truncation of tau at residue 391 leads to spontaneous assembly. We have introduced a new Figure 7 to the revised manuscript to explain this model in more detail. The corresponding new section in the Results reads:

      “To investigate this further, we also tested a tau construct comprising residues tau297-441 without the phosphomimetic mutations, but with a deletion of residues (Δ392-395). Filaments formed rapidly and the cryo-EM structure showed that the ordered core consisted of the amino-terminal part of the construct spanning residues 297-318 (Figure 6B). NMR analysis (Figure 6 – figure supplement 5B) showed that the tau297441 Δ392-395 construct exhibited similar backbone rigidity properties to the tau297-441 PAD12 construct, despite peak locations and local secondary structural propensities being more similar to the wildtype tau297-441 (Figure 6 – figure supplement 5A; Figure 6 – figure supplement 6). HSQC peak intensities in the 297-319 and 392-404 regions of tau297-441 Δ392-395 (Figure 6A, expanded from Figure 6 - figure supplement 5C) were like those in the tau297-441 PAD12. These data suggest that the IVYK deletion has a similar effect as the phosphomimetics on residues 396, 400, 403 and 404 on disrupting an intra-molecular interaction between the FIA core region and the carboxy-terminal domain, which may therefore be mediated by interactions between the two IVYK motifs that are similar to those observed in the FIA (Lövestam et al, 2024).”

      A new section in the Discussion now reads:

      “Our NMR data provide insights into the mechanism by which phosphorylation in the fuzzy coat of tau, or truncations of tau, lead to the formation of filaments with ordered cores of residues that are themselves not phosphorylated. HSQC peak intensity differences between unmodified tau 297-441, PAD12 tau 297-441 and tau297-391 suggest that phosphorylation of the fuzzy coat, particularly near the <sub>392</sub>IVYK<sub>395</sub> motif in the carboxy-terminal domain, a7ects the conformation of the residues of tau that become ordered in the FIA (Lövestam et al., 2024). Removal of residues <sub>392</sub>IVYK<sub>395</sub> in the carboxyterminal domain of tau 297-441 led to rapid filament formation in the absence of phosphomimetics, while HSQC peak intensity di7erences for this construct indicate similar backbone rigidity compared to tau 297-441 without the deletion, but with the four PAD12 mutations in the carboxy-terminal domain. Combined, these observations support a model where the <sub>392</sub>IVYK<sub>395</sub> motif in unmodified full-length tau monomers interacts with the <sub>308</sub>IVYK<sub>311</sub> motif, thus inhibiting filament formation by preventing the formation of the nucleating species, the FIA. Phosphorylation of nearby residues 396, 400, 403 and 404, or truncation at residue 391, disrupt this interaction and lead to filament formation. This model agrees with the previously proposed hairpin-like model of tau (Jeganathan et al., 2008), although the corresponding interaction between the aminoterminal domain of tau and the core-forming region remains unknown (Figure 7).”

      Due to the challenging nature of the assignment, it was not possible to assign all residues in the HSQC of the tau151-391 and the PAD12 tau151-391 samples, including residues 259-262 and 292-294 for PAD12 tau151-391. To make this clearer, we have marked residues that are not assigned with an asterisk in the revised version of Figure 6 – figure supplement 1.  

      (v) Concerning the Cryo-EM data from the different hyper-phosphorylation mimics, it would seem that the authors could at least comment on the proportion of monofilament and paired-filaments even if they could not solve the structures. Nonetheless, based on their previous publications, one would also expect that they could show whether the nontwisted filaments are likely to have the same structure (by comparing the 2D classes to projections of non-twisted models). Also, it is very interesting to note that the twist could be so strongly controlled by the charge distribution on the non-structured regions (and may be also related to the work by Mezzenga on twist rate and buffer conditions). Is the result reported in Figure 2 a one-oT case or was it also reproducible?

      As also indicated in the main text, the assembly conditions for the PAD12+4, PAD12-4 and PAD12+/-4 constructs were kept the same as those for the PAD12 construct. It is possible that further optimisation of the conditions could again lead to twisting filaments, but we chose not to pursue this route. With unlimited resources and time, one could assess in detail which of the PAD12 mutations are required and which ones could be omitted to form PHFs. However, this would require a lot of work and cryo-EM time. For now, we chose to prioritise reporting conditions that do work to reproducibly make PHFs in the laboratory (using the PAD12 construct) and leave the more detailed analysis of other constructs for future studies. 

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript addresses an important impediment in the field of Alzheimer's disease (AD) and tauapathy research by showing that 12 specific phosphomimetic mutations in full-length tau allow the protein to aggregate into fibrils with the AD fold and the fold of chronic traumatic encephalopathy fibrils in vitro. The paper presents comprehensive structural and cell based seeding data indicating the improvement of their approach over previous in vitro attempts on non-full-length tau constructs. The main weaknesses of this work results from the fact that only up to 70% of the tau fibrils form the desired fibril polymorphs. In addition, some of the figures are of low quality and confusing. 

      As also explained in our response to reviewer #1, we have performed better quantification of filament types in the revised manuscript, and we have investigated how to get rid of the singlets. In the revised manuscript, we report that singlets disappear as time passes and that one can obtain 100% pure PHFs by quiescently incubating samples for another two weeks, after shaking for a week.

      Strengths: 

      This study provides significant progress towards a very important and timely topic in the amyloid community, namely the in vitro production of tau fibrils found in patients.

      The 12 specific phosphomimetic mutations presented in this work will have an immediate impact in the field since they can be easily reproduced.

      Multiple high-resolution structures support the success of the phosphomimetic mutation approach. Additional data show the seeding efficiency of the resulting fibrils, their reduced tendency to bundle, and their ability to be labeled without affecting core structure or seeding capability.

      Weaknesses: 

      Despite the success of making full-length AD tau fibrils, still ~30% of the fibrils are either not PHF, or not accounted for. A small fraction of the fibrils are single filaments and another ~20% are not accounted for. The authors mention that ~20% of these fibrils were not picked by the automated algorithm. However, it would be important to get additional clarity about these fibrils. Therefore, it would improve the impact of the paper if the authors could manually analyze passed-over particles to see if they are compatible with PHF or fall into a different class of fibrils. In addition, it would be helpful if the authors could comment on what can be done/tried to get the PHF yield closer to 90-100%

      As mentioned above, in the revised manuscript we show that the singlets disappear over time and we now include a description of a method that leads to 100% PHF formation.

      Reviewer #1 (Recommendations for the authors):

      Minor points: 

      (a) In Figure 6 the dashed purple vertical lines overlap with the black bars, rendering a grey color which is confusing because the grey bars used for the shorter construct. It is suggested to improve the colors (remove transparency on the purple?)

      We thank the reviewers for their suggestions for improving the visualisation of our data. We have recoloured the tau297-391 data from grey to gold and moved the dashed lines to the back of image to remove the apparent colour changes.  

      (b) Is there any support for the suggestion that "part of the second microtubule-binding repeat is ordered" being "related to this construct forming filaments with only a single protofilament"? It seemed to have come out of nowhere.

      There is no further support for this statement, but we thought it would be worth hypothesizing about this observation. 

      (c) Figures 1 and 4 E is better described as a "main chain trace" or "backbone trace" although the latter usually refers to only CA positions. Ribbon usually refers to something else in representations of protein structures. 

      This has been changed into “main chain trace” in Figures 1 and 4. 

      (d) Figure 1 Supplement 3: Panel letters in the legend do not match. 

      This has been fixed.

      Reviewer #2 (Recommendations for the authors): 

      The introduction is a bit lengthy (e.g. 3rd paragraph of introduction) and could benefit by focusing specific question the manuscript addresses. 

      We have shortened the Introduction. It now contains ~1150 words, which we hope provides a better compromise between length and sufficient background information.

      Figure captions are generally not helpful in conveying a message to the reader.

      Figure 1 - figure supplement 3 is quite confusing. The 4 structures in A) do not correspond to the grids in B-E. What is this figure supposed to show?

      This confusion was probably the result of incorrect labelling of panels in the legend, which was also pointed out by reviewer #1. This has been fixed in the revised manuscript.

      Page 11: Although I know what you mean, 'linear increase of ThT fluorescence' is not the correct term. 

      We have replaced “linear” with “rapid”.

      Page 15: Although line shape and peak intensity can be related you are not reporting on line shape or width but simply on peak intensity. Therefore, I wouldn't talk about the result of a 'line shape analysis'.

      We have changed the wording accordingly. 

      Figure 6 (and supplement 1) are confusing and too small to be readable in print. It might be sufficient to show the CSP and upload the remaining data to the BMRB. 

      We have made a clearer version of the main NMR Figure 6 in the revised manuscript showing the most pertinent NMR data and have moved the previous version into the figure supplements. We designed these figures to be viewed as full page A4 panels, ideally seen in one image as they show multiple comparisons of different experiments and constructs.

      As such we feel these will be best viewed on screen as part of the eLife web document. We have uploaded HSQC spectra and assignments to the BMRB (see below).

      Figure 6 supplement 3 might benefit from pointing out key residues in the overlay.

      We have added the labels (this is now Figure 6 supplement 4).

      Data availability: Please upload the assignments to the BMRB together with key spectra (e.g. HSQCs). 

      We have uploaded HSQC data along with our assignments to the BMRB, the accession codes are 52694 – tau297-441 wt; 52695 – tau297-441 PAD-12; 52696 – tau151-391 wt; 52697 – tau151-391 PAD-12; and 53230 – tau297-441 delta392-395.  These accession codes have been added to the manuscript. 

      The quality of some of the figures (specifically Figure 1 - supplement 3 and Figure 6) is not suitable for publication. 

      For the original submission to bioRxiv, we produced a single PDF with a manageable file size. We will liaise with the eLife staff to ensure the images used in the version of record will be suitable for publication.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors  noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results. 

      We thank the reviewer for the positive comments on our work.

      Major comments 

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity. 

      We thank the reviewer for helping us to further specify this statement in our original submission. We now added muller plots in a new Appendix Figure (Figure A3) presenting the relative abundances of different types of effector cells in the population over time. Each effector type is colour-coded with its antigenicity and immunogenicity. To align with this Appendix Figure (Figure A3), we also updated our Figure 2 generated under the same realisation as Figure A3. We can now see clearly that the spikes in the mean values of the antigenicity and immunogenicity over the whole effector populations in new Figure 2B&2D indeed correspond to the expansion of single or several antigenic mutations recruiting the specific effector cell types. For example, in Figure 2B, we can see that the spikes of low average antigenicity and high immunogenicity (around time 11) happen at the same time when an effector type in Figure A3 with such a trait (coloured in green) arises and takes over the population. We have rewritten our Results section related (Line 192 - Line 222 in main text and Appendix A6).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this  model to investigate how tumour-immune interactions influence tumour outcome and summary  statistics of sequencing data. 

      Strengths: 

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumourimmune interactions and selection processes. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). 

      We have now developed a method to quantify the cyclic dynamics in our system (Appendix A7), where can track the directional changes phase portrait of the abundances of the cancer and effector cells. We first tested this method in a non-evolving stochastic predator-prey system, where our method can correctly capture the number of cycles in this system (Figure A7). We then use this method to quantify the number of cycles we observed between cancer and effector cells under different mutation rates (Figure A5) as well as whether they are counter-clockwise or clockwise cycles (Figure A6). Our results showed that the cyclic dynamics are more often to be observed when mutation rates are higher, and the majority of those cycles are counter-clockwise. When the mutation rate is high, we observe an increase of clockwise cycles, which have been observed in predator-prey systems and explained through coevolution. However, even under high mutation rates, counter-clockwise cycles are still the more frequent type. 

      In our simulations, we observed rarely out-of-phase cycles, which was by chance present in our original Figure 2. We have now removed that statement about out-of-phase cycles and replaced by more systematic analysis of the cyclic dynamics as described above (Line 192 to 207 in the revised version). We thank the constructive comment of the reviewer, which motivated us to improve our analysis significantly. 

      Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in  the absence of immune response. However the discrepancy appears quite small in panel D, and  there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. 

      We have now added statistical analysis using Wasserstein distance between the simulated mutation burden distribution and theoretical (neutral) expectation in Figure 5 C, D, G, H as well as in Figure A11 C&D when there is no cancer-immune interaction. We can see that the measurements of the  Wasserstein distance agrees with our statement, that the higher immune effectiveness leads to larger deviation from the neutral expectation.

      Lastly, the role of the Appendix results in the main messages of the paper is unclear. 

      We agree with the review and have now removed the Appendix sections “Deterministic Analysis”. 

      Reviewing Editor Comments: 

      I find the abstract too long. For example, "Knowledge of this coevolutionary system and the selection taking place within it can help us understand tumour-immune dynamics both during tumorigenesis but also when treatments such as immunotherapies are applied." can be shortened to: "Knowledge of this coevolutionary system can help us understand tumour-immune dynamics both during tumorigenesis and during immunotherapy treatments." 

      We agree and have taken the suggestion of the reviewer to shorten our abstract.

      Reviewer #1 (Recommendations for the authors): 

      The discussion at lines 134-140, centered around Figure A1, is an important and nicely constructed feature of the model. 

      Reviewer #2 (Recommendations for the authors): 

      I suggest that the authors conduct a more in-depth analysis of their conclusions on cyclic dynamics over a large set of sample paths.

      Done and please see our detailed response to the reviewer 2 above.

      In addition, statistical comparisons between the observed mutational burden distribution and  theoretical predictions in the absence of immune selection should be carried out to support their conclusions. In all cases, conclusions should be tested extensively for robustness/sensitivity to parameters. 

      Done and please see our detailed response to the reviewer 2 above.

      Here are some specific suggestions/comments: 

      (1) Please provide a precise mathematical description of the model to complement Figure 1. 

      We have significantly revised our “Model” section to provide a precise mathematical description of our model (Line 138 - 148). Please also see our document showing the difference between the revised version and original submission.

      (2) Section on "Interactions dictate outcome of tumour progress" and Figure 3: please define 'tumour outcome' - are the heatmaps produced in Figure 3 tumor size reflecting whether or not the population has reached level K before a particular time? Also, I do not see a definition for the 'slowgrowing' tumour proportion plotted in Figure 3CF or in the accompanying text. 

      We have now added the definition of “tumour outcome” in our “Model” section (line 171 to 176), where we explain our model parameters and quantities measured in the following “Results” section.

      (3) Figure 5C/G: the green dotted vertical line is difficult to see. 

      We have now changed the mean of the simulations to solid red lines instead of using the green dotted vertical lines previously.

      (4) Appendix A1 text under (A2) should U/N be U/C? N does not appear to be defined. 

      We have more removed the previous A1 section. Please see our response to reviewer 2 as well.

      (5) Text under (A5): it is unclear what is meant by "SFS must be heavy tailed (that is, more heterogeneous)" -- a more precise statement regarding tail decay rate and associated consequences would be more helpful. 

      We have more removed the previous A section, where the original text "...SFS must be heavy-tailed" was.

      (6) Section A4 and Figure A1: can these calculations be compared to simulations? 

      We have more removed the previous A section on the deterministic analysis as they are not so  relevant to our stochastic simulations indeed. Please see our response to reviewer 2 as well.

      (7) Also, in general, please clarify how the results in the Appendix are used in the main text conclusions or provide insights relevant to these conclusions. If they are not, one can consider removing them.  

      We have more removed the previous A section on the deterministic analysis. The remaining sections are about stochastic simulations and extended figures which support our main figures.  

      (8) Figure A2: the two lines are difficult to tell apart on each panel. Please consider different styles.

      We have changed one of the dotted lines to be solid. This figure is now Figure A1 in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specifically the authors have not resolved whether oxidative modification to 5mC and 3mC, or chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC is the principal source of the observed genotoxicity.

      (1) Original query which still stands: As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been [adequately] considered.

      We thank the reviewer for expanding on their previous comment.  We completely agree with the possibility that they raise and have added an extra paragraph in the discussion to expand on our consideration of the role of ssDNA in DNMT-induced DNA damage, which we reproduce here:

      "The observation that TET overexpression sensitizes cells expressing DNMTs to oxidative stress strongly suggests that the site of DNA damage is the modified cytosine itself.  However, we do not currently have definitive evidence supporting this.  As mentioned in the results section, the presence of unrepaired 3mC may lead to increased levels of ssDNA; it is also possible that 5mC itself may increase ssDNA levels.  Loss of alkB would be expected to increase the amount of ssDNA.  Thus DNA damage surrounding modification sites, but not specifically localised to it, might be the cause of the increased sensitivity.  These two different models make different predictions.  If modified cytosines are the source of the damage, mutations arising would be predominantly located at CG dinucleotides.  Alternatively, ssDNA exposure would result in distributed mutations that would not necessarily be located at CG sites.  The highly biased spectrum of mutations that can be screened through the Rif resistance assay does not allow us to address this currently.  However, future experiments to create mutation accumulation lines could allow us to address the question systematically on a genome-wide level. "

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on wellwatered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area.

      Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated

      that the moths gradually spent more time at the arena side with the click sounds.

      We thank the reviewers for their helpful comments. We agree with the summary, but would like to note that in the control experiment (Figure 2) we used a click rate of 30 clicks per minute—a design choice driven by the editor’s feedback. We have clarified this and, to further probe the system’s dynamics, added a second experiment employing the same click rate (30 clicks per minute) with a dehydrated plant (see details below). In both experiments, females again showed a clear tendency to oviposit nearer the speaker; these findings are described in the updated manuscript.

      (2) The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      We acknowledge that there are researchers who favor Bayesian graphical representation versus raw data visualization. Therefore, we have added chartplots of the raw data from Figure 1 in the supplementary section. We are aware of the duplication in presentation and apologize for this redundancy.  

      Regarding the variance and means we obtained in our experiment, we have analyzed all raw data using the statistical model presented, and if statistical significance was found despite a particular mean difference or variance, this is meaningful from a biological perspective. One can certainly discuss whether this difference has biological importance, but it should be remembered that in this experimental system, we are trying to isolate the acoustic signal from a complex system that includes multiple signals. Therefore, at no point we’ve suggested that this is a standalone factor, but rather proposed it as an informative and significant component. 

      In addition to the experiments described above, we conducted an experiment in which we counted both eggs and clusters. The results indicate that cluster counts are a reliable proxy for reproductive investment at a given location. In this experiment, we present cluster numbers alongside egg counts (Figure 2).

      Furthermore, we apologize for the technical error that prevented our uploaded data files from reaching the reviewers. We have also uploaded updated data and code.

      (3) Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      We thank the reviewer for this question. This is a valid and point that we will address in three aspects: 

      First, regarding our statistical approach, we used a model that takes into account the sequence of nights and examines whether there is an effect of the order of nights, i.e., we used GLMMs, with the night nested within the repetition. This is equivalent to addressing this as a repeated measure and is, to our best knowledge, the common way to treat such data. 

      Second, following the reviewer's comment, we also reran the statistics of the third experiment (i.e., “sound gradient experiments”, Figure 2 and Supplementary figure 4) when only taking the first night when the female/s laid eggs to avoid the concern of dependency. This analysis revealed the same result – i.e., a significant preference for the sound stimulus. We have now updated our methods and results section to clarify this point.  

      Third, an important detail that may not have been clearly specified in the methods: at the end of each night, we cleaned the arena of counted egg clusters using a cloth with ethanol, so that on the subsequent night, we would not expect there to be evidence of previous oviposition but thus would not exclude some sort of physiological or cognitive memories. We have now updated our methods section to clarify this important procedural point. 

      (4) Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dBlevels.

      This is an excellent point. We originally increased the click rate generate a strong signal. However, it was important for us to verify that there was ecological relevance in the stimulus we implemented in the system. For this purpose, we recorded a group of dehydrated plants at a distance of ~20cm and we measured a click rate of 20 clicks per minute (i.e., 0.33 Hz) (see Methods section). Therefore, as mentioned at the beginning of this letter, in the additional experiment described in Figure 2, we reduced the click frequency to 30 clicks per minute, and at this lower rate, the effect was maintained. Increasing plant density would probably lead to a higher rate of 30 clicks per minute. 

      (5) Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

      Good point. Our research uses a specific model system of one moth species and one plant species in a particular plant-insect interaction where females select host plants for their offspring. As with any model-based research that attempts to draw broader conclusions, we've taken care to distinguish between our direct findings and potential wider implications. We believe our system may represent mechanisms relevant to a wider group of herbivorous insects with hearing capabilities, particularly considering that several moth families and other insect orders can detect ultrasound. However, additional research examining more moth and plant species is necessary to determine how broadly applicable these findings are. We have made these clarifications in the text.

      Reviewer #2 (Public review):

      (6) The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.  

      Thank you for this insightful observation. We agree that the apparent attraction of females to dehydrated-plant sounds contradicts our initial prediction. Having observed this pattern consistently across multiple setups, we have now added a targeted choice experiment to the revised manuscript: here female moths were offered a choice between dehydrated plants broadcasting their natural ultrasonic emissions and a control. These results—detailed in the Discussion and presented in full in the Supplementary Materials (Supplementary Figure 4)—show that when only a dehydrated plant is available, moths would prefer it for oviposition, supporting our hypothesis that in the absence of a real plant, the plant’s sounds might represent a plant..

      (7) Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials

      Thank you for this suggestion. We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues. We agree that a detailed investigation of multimodal interactions deserves a separate study, which we plan to pursue in future work. 

      (8) There were also areas where more detailed explanations of the experimental methods would be beneficial.

      Thank you for highlighting this point. We have expanded and clarified the Methods section to provide comprehensive detail on our experimental procedures.

      Reviewer #1 (Recommendations for the authors):

      (9) Line 1: Please include the name of the species you tested also in the title as your results might not hold true for all moth species.

      We do not fully agree with this comment. Please see comment 5.

      (10) Line 19-20: Please rephrase the sentence so that it becomes clear that the "dehydration stress" refers to the plant and not to the moths.

      Thank you for the suggestion; we have clarified the text accordingly

      (11) Line 31: Male moths might provide many different signals to the females, maybe better "male sound signals" or similar.

      Thank you for the suggestion; we have clarified the text accordingly.

      (12) Line 52-53: Maybe mention here that not all moth species have evolved these abilities.

      Thank you for the suggestion; we have clarified the text accordingly.

      (13) Line 77: add a space after 38.

      Thank you for the suggestion; we have clarified the text accordingly.

      (14) Line 88: Maybe change "secondary predators" to "natural enemies".

      Thank you for the suggestion; we have clarified the text accordingly.

      (15) Line 134: Why is "notably" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (16) Line 140-144: If you did perform the experiment also with the more ecological relevant playback rate, why not present these findings as your main results and use the data with the higher playback frequency as additional support?

      Thank you for this suggestion. We agree that the ecologically relevant playback data are important; as described in detail at the beginning of this letter and also in comment 4, however, to preserve a clear and cohesive narrative, we have maintained the original ordering of this section. Nevertheless, the various experiments conducted in Figure 1 differ in several components from Figure 2 and the work that examined sounds in plant groups in the appendices. Therefore, we find it more appropriate to use them as supporting evidence for the main findings rather than creating a comparison between different experimental systems. For this reason, we chose to keep them as a separate description in "The ecological playback findings (Lines 140–144) remain fully described in the Results and serve to reinforce the main observations without interrupting the manuscript's flow.

      (17) Line 146: Please explain already here how you deafened the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (18) Line 181: should it be "male moths' " ?

      Thank you for the suggestion; we have clarified the text accordingly.

      (19) Line 215: Why is "without a plant" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (20) Line 234: I do not understand why this type of statistic was used to analyse the electroantennogram (EAG) results. Would a rather simple Student's t-test or a Wilcon rank sum test not have been sufficient? I would also like to caution you not to overinterpret the data derived from the EAG, as you combined the entire headspace into one mixture it is no longer possible to derive information on the different volatiles in the blends. The differences you observe might therefore mostly be due to the amount of emitted volatiles.

      We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues (See comment 7). 

      (21) Line 268: It might be nice to add an additional reference here referring to the multimodal oviposition behaviour of the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (22) Line 284: If possible, please add another reference here referring to the different cues used by moths during oviposition.

      Thank you for the suggestion; we have clarified the text accordingly.

      (23) Line 336: What do you mean by "closed together"?

      Thank you for the suggestion; we have clarified the text accordingly.

      (24) Line 434-436: Please see my overall comments. I do not think that you can call it ecologically relevant if the signal emitted by multiple plants is played in the context of just a single plant.

      Please see comments 1 and 4.

      (25) Line 496: Please change "stats" to statistics.

      Thank you for the suggestion; we have clarified the text accordingly.

      (26) Line 522-524: I am not sure whether simply listing their names does give full credit to the work these people did for your study. Maybe also explain how they contributed to your work.

      Thank you for the suggestion; we have clarified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      (27) L54 20-60kHz --> 20Hz-60kHz or 20kHz - 60kHz?

      OK. We have replaced it.

      (28) L124 Are the results for the condition where nothing was placed and the condition where a decoy silent resistor was placed combined in the analysis? If so, were there no significant differences between the two conditions? Comparing these with a condition presenting band-limited noise in the same frequency range as the drought-stressed sounds might also have been an effective approach to further isolate the specific role of the ultrasonic emissions.

      We have used both conditions due to technical constrains and pooled them tougher for analysis— statistical tests confirmed no significant differences between them—and this clarification has now been added to the Methods section including the results of the statistical test.

      (29) L125 (Fig. 1A), see Exp. 1 in the Methods). -> (Fig.1B. See Exp.1 in the Methods).

      Thank you for the suggestion; we have clarified the text accordingly.

      (30) L132 "The opposite choice to what was seen in the initial experiment (Fig.1B)"

      Thank you for the suggestion; we have clarified the text accordingly.

      (31) L137-143 If you are writing about results, why not describe them with figures and statistics? The current description reads like a discussion.

      These findings were not among our primary research questions; however, we believe that including them in the Results section underscores the experimental differences. In our opinion, introducing an additional figure or expanding the statistical analysis at this point would disrupt the narrative flow and risk confusing the reader.

      (32) L141 "This is higher than the rate reported for a single young plant" Are you referring to the tomato plants used in the experiments? It might be helpful to include in the main text the natural click rate emitted by tomato plants, as this information is currently only mentioned in the Methods section.

      See comment 4.  

      (33) L191 Is the main point here to convey that the plant playback effect remained significant even when the sound presentation frequency was reduced to 30 clicks per minute? The inclusion of the feeder element, however, seems to complicate the message. To simplify the results, moving the content from lines 185-202 to the supplementary materials might be a better approach. Additionally, what is the rationale for placing the sugar solution in the arena? Is it to maintain the moths' vitality during the experiment? Clarifying this in the methods section would help provide context for this experimental detail.

      In this series of experiments, we manipulated four variables—single moths, ultrasonic click rate, arena configuration (from a two-choice design to an elongated enclosure), and the response metric (total egg counts rather than cluster counts)—to evaluate moth oviposition under more ecologically realistic conditions. We demonstrate the system’s robustness and validity in a more realistic setting (by tracking individual moths, counting single eggs, etc.).  

      As noted in the text, feeders were included to preserve the moths’ natural behavior and vitality. We have further clarified this in the revised manuscript.

      (34) L215 Is the click presentation frequency 30 or 60 per minute? Since Figure 3 illustrates examples of moth movement from the experiment described in Figure 1, it might be more effective to present Figure 3 when discussing the results of Figure 1 or to include it in the supplementary materials for better clarity and organization.

      See comments 1 and 4. As mentioned in the above 

      (35) L291 Please provide a detailed explanation of the experiments and measurements for the results shown in Figure S3 (and Figure S2). If the multi-modal hypothesis discussed in the study is a key focus, it might be better to include these results in the main results section rather than in the supplementary materials.

      Thank you for this suggestion. Figure S2 was removed, see comments above. We’ve added now the context to figure S3.

      (36) L303 It might be helpful to include information about the relationship between the moth species used in this study and tomato plants somewhere in the text. This would provide an important context for understanding the ecological relevance of the experiments.

      Thank you for the suggestion; we have clarified the text accordingly.

      (37) Table 1 The significant figures in the numbers presented in the tables should be consistent.

      Thank you for the suggestion; we have clarified the text accordingly.

      (38) L341 The text mentions that experiments were conducted in a greenhouse, but does this mean the arena was placed inside the greenhouse? Also, the term "arena" is used - does this refer to a sealed rectangular case or something similar? For the sound presentation experiments, it seems that the arena cage was placed inside a soundproof room. If the arena is indeed a case-like structure, were there any specific measures taken to prevent sound scattering within the case, such as the choice of materials or structural modifications?

      Here, “arena” refers to the plastic boxes used throughout this study. In this particular experiment, we presented plants alone—reflecting ongoing debate in the literature—and used these trials as a baseline for our subsequent sound-presentation experiments, during which we measured sound intensity as described in the Methods section. All sound-playback experiments were conducted in sound-proof rooms, and acoustic levels were measured beforehand—sound on the control side fell below our system’s detection threshold. 

      (39) L373 "resister similar to the speaker" Could you explain it in more detail? I think this would depend on the type of speaker used-particularly whether it includes magnets. From an experimental perspective, presenting different sounds such as white noise from the speaker might have been a better control. Was there a specific reason for not doing so? Additionally, the study does not clearly demonstrate whether the electric and magnetic field environments on both sides of the arena were appropriately controlled. Without this information, it is difficult to evaluate whether using a resistor as a substitute was adequate.

      Thank you for this comment. We have now addressed this point in the Discussion. We acknowledge that we did not account for the magnetic field, which might have differed between the speaker and the resistor. We agree that using an alternative control, such as white noise, could have been informative, and we now mention this as a limitation in the revised Methods.

      (40) L435 60Hz? The representation of frequencies in the text is inconsistent, with some values expressed in Hz and others as "clicks per second." It would be better to standardize these units for clarity, such as using Hz throughout the manuscript.

      We agree that this is confusing. We reviewed the text and made sure that when we addressed click per second, we meant how many clicks were produced and when we addressed Hz units it was in the context of sound frequencies.  

      (41) L484 "we quantified how many times each individual crossed the center of the arena" Is this data being used in the results?

      Yes. Mentioned in the text just before Figure 3. L220

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Meunier et al. investigated the functional role of IL-10 in avian mucosal immunity. While the anti-inflammatory role of IL-10 is well established in mammals, and several confirmatory knockout models are available in mice, IL-10's role in avian mucosal immunity is so far correlative. In this study, the authors generated two different models of IL-10 ablation in Chickens. A whole body knock-out model and an enhancer KO model leading to reduced IL10 expression. The authors first performed in vitro LPS stimulation-based experiments, and then in vivo two different infection models employing C. jejuni and E. tenella, to demonstrate that complete ablation of IL10 leads to enhanced inflammation-related pathology and gene expression, and enhanced pathogen clearance. At a steady-state level, however, IL-10 ablation did not lead to spontaneous colitis. 

      Strengths: 

      Overall, the study is well executed and establishes an anti-inflammatory role of IL-10 in birds. While the results are expected and not surprising, this appears to be the first report to conclusively demonstrate IL-10's anti-inflammatory role upon its genetic ablation in the avian model. Provided this information is applicable in combating pathogen infection in livestock species in sustainable industries like poultry, the study will be of interest to the field. 

      Weaknesses: 

      The study is primarily a confirmation of the already established anti-inflammatory role of IL-10. 

      We do not agree that this work is primarily confirmatory. The anti-inflammatory role of IL10 was indeed known previously from studies in mammals. The much more general insight from the current study is our demonstration of the intrinsic trade-off between inflammation and tolerance in the response to both the microbiome (which was significantly altered in the IL10 knockout birds) and mucosal pathogens. The study of Eimeria challenge in particular highlights the fact that it may be better for the host to tolerate a potential pathogen than to take on the cost of elimination.

      Reviewer #2 (Public review): 

      Summary: 

      The authors were to investigate the functional role of IL10 on mucosal immunity in chickens. CRISPR technology was employed to generate IL10 knock-out chickens in both exon and putative enhancer regions. IL10 expressions were either abolished (knockout in exon) or reduced (enhancer knock-out). IL-10 plays an important role in the composition of the caecal microbiome. Through various enteric pathogen challenges, deficient IL10 expression was associated with enhanced pathogen clearance, but with more severe lesion scores and body weight loss. 

      Strengths: 

      Both in vitro and in vivo knock-out abolished and reduced IL10 expression, and broad enteric pathogens were challenged in vivo, and various parameters were examined to evaluate the functional role of IL10 on mucosal immunity. 

      Weaknesses: 

      Overexpression of IL-10 either in vitro or in vivo may further support the findings from this study. 

      An overexpression experiment, regardless of outcome, would not necessarily support or invalidate the findings of the current study. It would address the question of whether the absolute concentration of IL10 produced alters the outcome of an infection.

      Reviewer #1 (Recommendations for the authors): 

      The following are the recommendations that, in my opinion, will be helpful to enhance the quality of the study. 

      Major point: 

      The authors at a steady state did not observe any sign of spontaneous colitis. Since IL-10 KO in mice leads to enhanced pathological score upon DSS-mediated induction of colitis, and several colitis models are well established in birds, it will be worthwhile to test the consequence of experimentally inducing colitis in this context. 

      One of the novel features of this study is the observation that the microbiome is modified in the IL10KO HOM chicks, which may serve to mitigate potential spontaneous pathology; we now mention this in the discussion. We agree that it could be worthwhile in the future to look at additional challenge models. However, we would argue that the Eimeria challenge is a sufficiently adequate experimentally-induced model of colitis to demonstrate the increased inflammation that occurs in an IL10-deficient bird. This is further supported by evidence of enhanced inflammatory responses in the caeca of IL10KO HOM birds challenged with Campylobacter or Salmonella relative to WT controls. See in the revised manuscript (pages 12-13).

      Minor points: 

      (1) In Figure 2B, the authors should confirm whether the ROS-AV163 groups also have LPS treatment. 

      The legend for Figure 2B already states that neutralizing anti-IL10 antibody was added to LPS-stimulated BMDMs: “Nitric oxide production was assessed by measuring nitrite levels using Griess assay for LPS-stimulated BMDMs […] in the absence or presence of neutralizing anti-IL10 antibody ROS-AV163”. However, for added clarity we have now modified the x-axis label for Figure 2B (“+ROS-AV163” replaced by “+LPS +anti-IL10”) and we have also made minor changes to the figure legend. See in the revised manuscript (page 33).

      (2) In Figure 3F, the authors should discuss why the duodenum of KO birds has enhanced infiltration compared to WT? 

      We are not sure what the reviewer is referring to here. Although not specifically mentioned in Figure 3F, there is no statistically significant difference in cellular infiltration in the duodenum of IL10KO WT and HOM birds raised in our specified pathogen-free (SPF) facility, nor in the duodenum of IL10KO WT and HOM birds raised in our conventional facility (Mann-Whitney U tests, p>0.1 in both cases); this can be seen in the sums of histopathological scores shown in Figures 3C (SPF facility) and 3E (conventional facility). Figure 3F shows that there is a statistically significant difference in cellular infiltration scores in the duodenum and proximal colon of both IL10KO WT and HOM birds based on the environment they are raised in (SPF vs conventional). We have made minor changes to the text to clarify this. See in the revised manuscript (page 7).

      (3) The authors should discuss the observed differences in the C. jejuni colonization results among the two cohorts at week 1 and week 2 post-infection. 

      Numbers of C. jejuni in the caeca of IL10KO HOM birds were markedly lower than for WT controls at 1-week post-infection in cohort 1, and at both time intervals post-infection in cohort 2 (Figure 4A). This reached statistical significance at 1-week post-infection in cohort 1 and at 2-weeks post-infection in cohort 2. It is evident from Figure 4A that considerable inter-animal variance existed in each group, and in the IL10KO HOM birds in particular. This is typical of C. jejuni colonisation in chickens, where bacterial population structures have been reported to be variable and unpredictable (Coward et al., Appl Environ Microbiol 2008, PMID: 18424530). Similar variation between time intervals, birds and repeated experiments has been reported when evaluating vaccines against C. jejuni colonisation (e.g. Buckley et al., Vaccine 2010, PMID: 19853682; Nothaft et al., Front Microbiol 2021, PMID: 34867850). We performed two independent studies for this reason. Taken together, we consider that our data provide convincing evidence of elevated pro-inflammatory responses upon C. jejuni infection in IL10KO HOM birds relative to WT controls that associates with reduced bacterial burden. Our data is also consistent with a published observation that a commercial broiler line with low IL10 expression had correspondingly elevated expression of CXCLi-1, CXCLi-2 and IL-1b (Humphrey et al., mBio 2014, reference 33 in our original submission). We have added text to the discussion to capture the points above.  See in the revised manuscript (page 13).

      Reviewer #2 (Recommendations for the authors): 

      For the animal challenging experiments, both IL10KO HOM and IL10EnKO HOM chickens were used for Eimeria challenge, but not for Salmonella and Campylobacter. Could the authors justify why? 

      The Eimeria challenge produced a much higher and more reproducible level of inflammation than either of the bacterial challenge models. Within the parasite challenge cohorts, IL10KO HET and IL10EnKO HOM birds were only marginally different from WT controls (e.g. parasite replication: Figures 5A and B; lesion scores: Figures 5E and F; body weight gain: Figures 5G and H). Given the more limited response and the inter-individual variation in the bacterial challenge models, we felt that analysis of a sufficiently large cohort of the IL10KO HOM was appropriate, while additional cohorts of IL10KO HET and IL10EnKO HOM birds large enough to detect statistically significant differences could not be justified.

      In the M&M, there was no mention of # of birds generated for IL10EnKO HOM, HET, etc. 

      Full details of bird numbers can be found in SI Appendix Table S1 “Number of IL10KO and IL10EnKO WT, HET and HOM chicks hatched in the NARF SPF chicken facility in the first (G1) and second (G2) generations”. Table S1 is already referred to in the Results section “Generation of IL10-deficient chickens”; we have now also clearly referred to it in the “Animals” and “Generation of surrogate host chickens and establishment of the IL10KO and IL10EnKO lines under SPF conditions” sections of the Materials and Methods. In all three sections we have also added some text to clarify that the table details G1 and G2 bird numbers. See in the revised manuscript (pages 5, 15, 17).

      From the results of Campylobacter challenge, the results from the cohort 1 and cohort 2 were not consistent at both 1 and 2 weeks of post-infection. There is not much discussion on this inconsistency. What is the final conclusion: significant difference in week 1 or week 2, OR none of them, OR both of them. What would happen if an additional cohort were conducted for Salmonella and Eimeria? 

      As noted in response to Reviewer 1 (minor point 3), we have now added text to the discussion on the partial inconsistency between independent C. jejuni challenge studies. We do not feel that additional experiments to address this comment are required. Highly significant increases in the infiltration of lymphoplasmacytic cells and heterophils were detected in IL10KO HOM chickens relative to WT controls in the caeca, a key site of Campylobacter colonisation. This was consistently observed in two independent cohorts at both 1- and 2-weeks post-infection (SI Appendix Figures S7 and S8) and was reflected in similar patterns of expression of pro-inflammatory genes at these intervals in both cohorts (Figure 4B). As our laboratory has observed substantially less variation between repeated Salmonella challenges, a single study was performed, but with adequate power to detect statistical differences.  The effects of E. tenella infection in IL10KO WT and HOM birds were replicated (compare Figure 4 with data from day 6 in Figure 5).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Axon growth is of course essential to the formation of neural connections. Adhesion is generally needed to anchor and rectify such motion, but whether the tenacity or forces of adhesion must be optimal for maximal axon extension is unknown. Measurements and contributing factors are generally lacking and are pursued here with a laser-induced shock wave approach near the axon growth cone. The authors claim to make measurements of the pressure required to detach axons from low to high matrix density. The results seem to support the authors' conclusions, and the work - with further support - is likely to impact the field of cell adhesion. In particular, there could be some utility of the methods for the adhesion and those interested in aspects of axon growth.

      Strengths:

      A potential ability to control the pressure simply via proximity of the laser spot is convenient and perhaps reasonable. The 0 to 1 scale for matrix density is a good and appropriate measure for comparing adhesion and other results. The attention to detachment speed, time, F-actin, and adhesion protein mutant provides key supporting evidence. Lastly, the final figure of traction force microscopy with matrix varied on a gel is reasonable and more physiological because neural tissue is soft (cite PMID: 16923388); an optimum in Fig.6 also perhaps aligns with axon length results in Fig.5.

      We thank you for your many suggestions to improve the presentation to explain our experimental results obtained. We carefully reconsidered problems you pointed out and revised the manuscripts as follows.

      Weaknesses:

      The results seem incomplete and less than convincing. This is because the force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements.

      As the force calibration data, although we have indicated by the experimental system over 10 years ago, we have used the same system under appropriate maintenance. The system performance has been checked regularly and maintained. Therefore, the calibration data displayed is suitable even in the present. There is no problem with the calibration data.

      Secondly, the claimed effect of pressure on the detachment of the growth cone does not consider other effects such as cavitation or temperature, and certainly needs validation with additional methods that overcome such uncertainties.

      The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um<sup>2</sup> in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward.

      We have previously reported that a single pulse from a Ti:sapphire femtosecond laser amplifier can effectively generate shockwave and stress waves with minimal thermal effects. Notably, during this process, the temperature elevation at the laser focal point is sufficiently suppressed, allowing efficient force generation without causing significant heating in the surrounding area. By applying this method, we have confirmed that cell have any damage after the force loading. Therefore, this approach enables cell detachment while minimizing thermal and cavitation-induced damage to the cell. This clarification has been incorporated into the revised results section (lines 119-120). We agree with the reviewer that the presented data was insufficient for supporting the proposed model. To this end, we have performed additional experiments and analyses, which are included in the revised version of the manuscript. To examine the impact of femtosecond laser irradiation on laminin, fluorescently labeled laminin was coated onto glass-bottom dishes, and the fluorescent intensity was analyzed before and after the impulsive force loading. The result indicates that the fluorescent intensity at the laser focal point remained unaffected by laser irradiation. This finding suggests that axon detachment results from the dissociation between L1 and laminin rather than the detachment of laminin from the substrate. These data have been incorporated into Supplementary Fig. 1 and page 5 (lines 113-120). In addition, explanation of the relationship between the adhesion pressure and the traction stress has been specified in page 8 (lines 253-258).

      The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      In response to the reviewer’s request, we measured the axon length on the polyacrylamide gel with stiffness comparable to brain tissue (0.3kPa). The axon length was consistently shorter on the gel on the glass under our experimental conditions, in agreement with previous findings (Abe at al., 2021). Furthermore, a biphasic relationship between axon outgrowth and laminin concentration was observed. These results suggest that the biphasic behavior of axon outgrowth identified in this study is likely to occur in vivo. We have updated the Fig. 6 and specified the result (lines 224-225) in revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements - which are essential. Effects of cavitation and temperature must be checked, and validated with additional methods that overcome such uncertainties. The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um2 in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward. The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      Thank you this reviewer for the recommendations on our manuscript. For this, we have answered above comments. Please find our response there.

      Reviewer #2 (Public Review):

      Summary:

      The authors measure axon outgrowth rate, laminin adhesion strength, and actin rearward flow rate. They find that the axon outgrowth rate has a biphasic dependence on adhesion strength. In interpreting the results, they suggest that the results "imply that adhesion modulation is key to the regulation of axon guidance"; however, they measure elongation rate, not guidance.

      Strengths:

      The measurements of adhesion strength by laser-induced shock waves are reasonable as is the measurement of actin flow rates by speckle microscopy.

      Weaknesses:

      They only measure the length of the axons after 3 days and have no measurements of the actual rate of growth cone movements when they are moving. They do not measure the rate of actin growth at the leading edge to know its contribution to the extension rate. This is inadequate.

      These studies are unlikely to have an impact on the field because the measurement of axon growth rate at short times is missing.

      We thank the reviewer for understanding novelty of our study. We agree with the reviewer’s comment. Following the comment, we performed time-lapse imaging of growth cone movements and quantified the migration rate. Consistent with the length of axons, the migration rate did not exhibit a monotonic increase with increased L1CAM-laminin binding but rather displayed biphasic behavior, where excessive L1CAM-laminin binding led to a reduction in the migration rate. Notably, the biphasic migration behavior was abolished in the L1CAM knockdown neurons. We believe these results provide further support for our proposed model. This has been incorporated into new Fig.5 and page 7 (lines 209-218) of the revised manuscript. In addition, the experimental method has been added in page 13 (lines 385-391).

      Reviewer #2 (Recommendations For The Authors):

      This is a very weak paper because of the lack of relevant measurements to enable correlations between actual extension rate, traction force, and rates of speckle movement.

      Thank you this reviewer for the critical comment on our model. we performed time-lapse imaging of growth cone movements and quantified the migration rate. From this reviewer and reviewer #3 comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth (Please also find our response to recommendations from reviewer #3).

      Reviewer #3 (Public Review):

      Summary:

      Yamada et al. build on classic and more recent studies (Chen et al., 2023; Lemmon et al., 1992; Nichol et al., 2016; Zheng et al., 1994; Schense and Hubbell, 2000) to better understand the relationship between substrate adhesion and neurite outgrowth.

      Strengths:

      The primary strength of the manuscript lies in developing a method for investigating the role of adhesion in axon outgrowth and traction force generation using a femtosecond laser technique. The most exciting finding is that both outgrowth and traction force generation have a biphasic relationship with laminin concentration.

      Weaknesses:

      The primary weaknesses are a lack of discussion of prior studies that have directly measured the strength of growth cone adhesions to the substrate (Zheng et al., 1994) and traction forces (Koch et al., 2012), the inverse correlation between retrograde flow rate and outgrowth (Nichol et al., 2016), and prior studies noting a biphasic effect of substrate concentration of neurite outgrowth (Schense and Hubbell, 2000).

      Overall, the claims and conclusions are well justified by the data. The main exception is that the data is more relevant to how the rate of neurite outgrowth is controlled rather than axonal guidance.

      This manuscript will help foster interest in the interrelationship between neurite outgrowth, traction forces, and substrate adhesion, and the use of a novel method to study this problem.

      We thank the reviewer for appropriate comments and recognition of the strength to our manuscript. Regarding to these comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth. With respecting the prior studies, we revised the introduction (lines 38-44, 61-65) and discussion (lines 272-281) in the manuscript. The references suggested by the reviewer have been added (Ref. 17, 26, 27, 31, and 35) (see also below responses).

      Reviewer #3 (Recommendations For The Authors):

      Overall, I found the experiments discussed in the manuscript to be excellent. My primary suggestion is to slightly expand the introduction and discussion to put this work in context better. Additionally, the writing is unclear in places and would be helped by a careful edit.

      We appreciate the reviewer’s constructive critiques and would like to thank him/her for the experimental suggestions, which we have taken into account in the revised version of the manuscript. We trust that the additional modification of the text will satisfactorily address the reviewer’s concerns.

      In more detail:

      The introduction is well-written but could be improved by discussing how these studies build earlier work. Through the 1980s and 90s, an important question was whether growth cone guidance occurred as the result of chemical cues that altered the activity of signaling pathways or differences in the adhesion between growth cones and substrates. While there was some clear evidence that growth cones were steered to more adhesive substrates (Hammarback and Letourneau, 1986), there were also important exceptions. For example, (Calof and Lander, 1991) examined the biophysical relationship between neuronal migration and substrate adhesion and found that laminin, which tends to support rapid migration and neurite outgrowth, tended to decrease adhesion.

      Thank you for critical comments to our manuscript. We have modified the introduction to discuss our understanding of the growth cone guidance, particularly regarding the role of neurite migration and substrate adhesion into introduction (line 38-40, 42-44) in revised manuscript.

      To better understand the relationship between substrate adhesion and outgrowth, Heidemann's group (Zheng et al., 1994) was, to the best of my knowledge, the first paper to directly measure the force required to detach growth cones from substrates; including laminin and L1. For DRG neurons, this was ~ 1000 - 3000 dynes (i.e., 10 to 30 nN) and they noted that traction force generation is 3 to 15 times less than the force needed to dislodge growth cones. Additionally, that manuscript goes on to suggest, "These data argue against the differential adhesion mechanism for growth cone guidance preferences in culture." With the rising development of powerful molecular genetic tools and a growing appreciation of the importance of signaling pathways in neurite outgrowth (Huber et al., 2003), the field as the whole has focused on the molecular aspects of growth cone guidance, leaving many aspects of the physical process of neurite outgrowth unanswered. The strength of this manuscript is that it develops a new method for measuring growth cone adhesion forces, which reassuringly generates similar results to classic studies. In turn, it combines this with molecular genetic analysis to determine the contribution L1-LN interaction makes to the overall adhesion strength.

      We will ensure that the manuscript explicitly acknowledges the significance of Zheng et al. (1994) in shaping the field and clarifies how our study expands upon these foundational findings. Following the reviewer’s suggestion we have added Zheng et al. (1994) in reference and modified discussion (line 272-281, Ref. 17) in revised manuscript.

      There are also a couple of other papers directly relevant to this work. In particular, (Koch et al., 2012) measured the traction forces generated by hippocampal neurons on polyacrylamide gels. They estimated it to be ~ 5 to 10 Pa. While the overall results are similar, in this manuscript, it is reported that the forces generated by hippocampal neurons are significantly higher, in the range of 25-75 Pa. I don't have an issue with this difference, but please look at the Koch paper and see if there is some technical reason for the different estimates of traction forces. Along these lines, please note the Young's modulus of the gels used in the experiments.

      As you mentioned, the traction force measured in our experiments is more than 5 times stronger than that reported by Koch et al., While the exact reason remains unclear, difference in gel-coating may have influenced the result. In the study by Koch et al., pre-coating was performed using Cell-Tak before laminin coating. in contrast, our study used poly-lysin for pre-coating. This methodological difference may have affected the measurement of traction force. However, at least, our experiments have consistently yielded reproducible results.

      (Nichol et al., 2016) nicely shows an inverse relationship between RF rate and LN density at low concentrations. While the results reported here are similar, a strength of this paper is that it extends the work to higher LN concentrations.

      Thank you for pointing out the relevance of Nichol et al., 2016 to our study. We agree that their study provides important insights into the relationship between RF rate and LN density at low concentrations. The novelty our study lies not only in extending the analysis to higher LN concentrations, but also performed analysis that include adhesion strength, traction force, and migration rate in the growth cone. We have included this discussion (line 259-261, Ref. 26) in revised manuscript.

      My understanding is that the biphasic effect of LN in neurite outgrowth was previously established. For example, Buetter and Pittman, 1991 note a biphasic effect of LN conc on some parameters of neurite outgrowth, such as RMS, a measure of growth cone velocity, but not others, such as total neurite length. Likewise, (Schense and Hubbell, 2000) noted a biphasic effect of RGB peptides on outgrowth. In light of this, it would seem the main contribution of this paper is the finding that traction force generation has a bi-phasic relationship with LN concentration.

      Thank you for your thoughtful comment. We agree that the main contribution of this study is demonstrating that the biphasic behavior of axon migration arises from the biphasic dependence of the traction force on laminin concentration. We have included this discussion (line 272-281, Ref. 31) in the revised manuscript.

      Please appreciate that I'm not asking the authors to copy-paste the text above into the manuscript. Instead, the references provide a starting point for better explaining the novel contributions here. The interaction of adhesions, traction force generation, the rate of neurite outgrowth, and biophysics of growth cone guidance is a classic problem in neuronal mechanics but is far from solved. My hope is that this manuscript might inspire more interest in this problem.

      Thank you for your thoughtful feedback and for highlighting the importance of better contextualizing our novel contributions within the broader field of neuronal mechanics. We appreciate your emphasis on the classic yet unresolved nature of the interactions between adhesions, traction force generation, axon outgrowth rate, and the biophysics of growth cone guidance.

      We hope these revisions help strengthen the manuscript’s impact and inspire further investigation into this important problem. We appreciate your insightful comments and the opportunity to improve our work.

      The text would be improved with a careful copy edit, for example:

      The last sentence of the introduction currently reads, "We suggested mechanism of the axon outgrowth which depends on the density of laminin on the substrate, revealing L1CAM-laminin binding as a mechanism for the regulation of axon outgrowth." which is challenging to understand.

      We appreciate the reviewer’s comment pointing out the lack of clarity in the final sentence of the introduction. To improve readability and clarity, we have revised the sentence as follows:

      “In this study, we suggested mechanism of the axon outgrowth that depends on the density of laminin on the substrate, i.e. the L1CAM-laminin binding is key to the regulation of axon outgrowth..” We believe this revised version better conveys our main finding in a more concise and comprehensible manner.

      Line 224 needs to be F-actin and the next sentence is difficult to understand.

      Thank you for pointing this out. We have corrected "F-action" to "F-actin" to ensure accuracy (line 256). Additionally, we have revised the following sentence to improve clarity (line 256-258).

      Line 232 instead of "traction force slows", did you mean the rate of retrograde flow slows?

      Thank you for pointing this out. We mean to refer to the rate of retrograde flow, not the traction force itself. We have revised the wording accordingly to avoid confusion (line 266).

      Line 242, shear-stress instead of share-stress.

      We have corrected the typo into "shear-stress" (line 282).

      Lines 255, 267, and the abstract. The paper doesn't directly address axonal guidance. It would be more accurate to replace axonal guidance with neurite outgrowth.

      Thank you for your insightful comment. We agree that the term "neurite outgrowth" more accurately reflects the scope of our study, as we do not directly examine the mechanisms of axonal guidance. Accordingly, we have revised the text in Lines 273, 275, and the abstract to replace "axonal guidance" with "neurite outgrowth" to better align with the presented data and experimental focus.

      Line 362, perhaps reference (Minegishi et al., 2021) here as it provides a nice explanation of the technique.

      Thank you for the helpful suggestion. We have now added a reference to Minegishi et al., 2021 (line 416, Ref.35) in revised manuscript, as it indeed provides a clear explanation of the method.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 27: "lineage-specific and -independent deletions": it is still not clear to me what a lineage-independent, or convergent, deletion is supposed to be. TBD1, for instance, is not lineage-specific, but it is also not convergent: it occurred once in the common ancestor of lineages 1, 2, and 3, while convergence implies multiple parallel occurrences.

      We have changed this and in other places to more evolutionary terms, such as divergent (single event) and convergent (multiple events), or explain exactly what is meant where needed.

      l. 118: "where relevant", what does that mean?

      This was superfluous to the description and so is now removed.

      l. 178ff.: It is not clear to me what issue is addressed by this correction of the pangenome graph. Also here there seems to be some confusion regarding orthologs and paralogs. A gene or IS copy can be present at one locus but absent at another, which is not a mistake of Pangraph that would require correction. It's rather the notion of "truly absent region" which is ambiguous.

      We have changed the text to be more specific on the utility of this step. Since it is known that Panaroo mislabels some genes as being absent due to over splitting (see Ceres et al 2022 and our reclassification earlier in the paper), we wanted to see if the same occurred in Pangraph. We have modified the methods text to be more specific (line 181) and in the results included the percentage of total genes/regions affected by this correction.

      In relation to copy number, Pangraph is not syntenic in its approach; if a region is present anywhere it is labelled as present in the genome. Pangraph will look for multiple copies of that region (e.g. an IS element) but indeed we did not look for specific syntenic changes across the genomes. This would be a great analysis and something we will consider in the future; we have indicated such in the discussion (line 454).

      l. 305: "mislabelled as absent": see above, is this really 'mislabelled'?

      See answer to question above

      l. 372: "using the approach": something missing here.

      This was superfluous to the description and so is now removed.

      l. 381: the "additional analysis of paralogous blocks" (l. 381) seems to suffer from the same confusion of ortho- and paralogy described above: no new sub-lineage-specific accessory regions are found presumably because the analysis did consider any copy rather than orthologous copies.

      Paralogous copies were looked for by Pangraph, and we did not find any sub-lineage where all members had additional copies compared to other sub-lineages. Indeed, single genomes could have these, and shorter timescales could see a lot of such insertions, but we looked at longer-scale (all genomes within a sub-lineage) patterns and did not find these. These limitations are already outlined in the discussion.

      l. 415: see above. There is no diagnosis of a problem that would motivate a "correction". That's different from the correction of the Panaroo results, where fragmented annotations have been shown to be a problem.

      Of interest, the refining of regions did re-label multiple regions as being core when Pangraph labelled it as absent from some genomes was at about the same rate as the correction to Pangraph (2% of genes/regions). This indicates there is a stringency issue with pangraph where blocks are mislabelled as absent. The underlying reason or this is not clear but the correction is evidently required in this version of Pangraph.

      l. 430ff.: The issue of paralogy and that the "same" gene or region is defined in terms of homology rather than orthology should be addressed here. For me the given evidence does not support the claim that deletion is driving molecular evolution in the MTBC.

      As outlined above, indeed paralogy may be driving some elements of the overall evolutionary patterns; our analysis just did not find this. Panaroo without merged paralogs did not find paralogous genes as a main differentiating factor for any sub-lineage. Pangraph also did not find multiple copies of blocks present in all genomes in a sub-lineage. As outlined above, indeed single genomes show such patterns but we did not include single genome analyses here, and outline that as a next steps in the discussion. We have also linked to a recent pangenome paper that showed duplication is present in the pangenome of Mtbc, although not related to any specific lineage (Discussion line 485).

      l. 443 ff: "lineage-independent deletions (convergent evolution)": see above, I still think this terminology is unclear

      This has now been made clearer to be specifically about convergent and divergent evolutionary patterns.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055 or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to identify secondary KRAS mutations, none of which were found. We acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      The revised manuscript will feature improved figure quality, complete and clarified figure legends, and corrected textual errors to enhance overall clarity and presentation.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We will revise the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      Will be done accordingly in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to identify secondary mutations in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data will be provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The texts will be revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics will be added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future studies. The raw data will be added as supplements.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      will be uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition ( i.e., H358 10.1016/j.jtcvs.2005.06.051 ; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells showed high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

    1. Author response:

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewer recognising that our study has been carefully performed and provides a valuable resource for the community. The characterization of Repo-man proline hydroxylation is also recognised as a novel finding.

      With respect to Concerns raised by reviewer 1:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations.

      We do not agree that the apparent concern raised here, i.e., that the method we present is not 100% specific for enriching only hydroxylated peptides, is a serious issue. We show specifically that our method indeed enriches samples for hydroxylated peptides, thereby increasing the chances of identifying proline hydroxylated peptides in a cell extract. We never claimed that it was mono-specific for enrichment of hydroxylated peptides. Further, we note that almost no chromatographic method we know of, including those commonly used to enrich for different types of post translationally-modified peptides (including phospho-peptides) is completely mono-specific for a single type of modified peptide. The reviewer comments that it could have been possible to use alternative methods to identify proline-hydroxylated peptides. This may be true, but we know of no published examples, or previous studies, where this has been demonstrated experimentally on a scale comparable to that we show here. Of course there is always more than one way to approach technical challenges and it may be that future methods will be demonstrated that achieve equivalent, or even superior, results with respect to the detection of proline hydroxylated peptides. To the best of our knowledge, however, our current study provides a robust methodology that goes well beyond any previously published analysis of proline hydroxylation.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyllysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      We feel that the reviewer’s initial comment is potentially misleading - it implies that we were proposing here that the 'HyPro immonium ion is a diagnostic ion for HyPro identification’. In contrast, this concept was already widely held in the field before we started this project. Indeed, the fact that the diagnostic HyPro immonium ion is often difficult to detect, has been used as one of the arguments by other researchers to support the view that HIF-α is the only physiologically relevant target for PHD enzymes, a controversy referenced explicitly by Reviewer 2 below. What we actually show here are novel data that help to explain why the diagnostic HyPro immonium ion is often difficult to detect, when standard approaches and technical parameters for MS analysis are used. We beleive that this observation, along with other data we present, is a useful contribution to the field that can help to resolve the previous controversies concerning the true prevalence and biological roles of PHD-catalysed proline hydroxylation on protein targets.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We agree that this study is not quantifying or addressing the stoichiometry of proline hydroxylation across the very large number of new PHD target sites we identify. That was not claimed and was not the objective of our study. Nonetheless, we feel the comments of the referee do not adequately take into account the SILAC data we included (cf Figure 8) or the full range of experimental data presented in this study. We would further refer the reviewer also to the data presented in the companion paper by Druker et al., which we cross-referenced extensively in our study and have also made available previously on biorxiv.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      Here again we refer to the recent controversy referenced explicitly by Reviewer 2 below, concerning the view expressed by some researchers that only HIF-α is a physiological substrate for PHD enzymes in cells. We were challenged to show that any of the novel protein targets of PHDs we identified were indeed hydroxylated by PHD enzymes in vitro and that is what we demonstrated in Figure 9. This was not an experiment performed to quantify stoichiometry and indeed, it is not possible to draw any firm conclusions about efficiency or stiochiometry in vitro when using catalytic PHD subunits alone, given that we do not yet know whether PHDs may show different properties in cells, dependent on interactions with other factors and/or modifications.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our manuscript presents an advanced, standardized protocol for identifying proline hydroxylation, with well designed experiments, which may help resolve confusion in the field.

      With respect to Concerns raised by reviewer 2:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      We agree and plan to provide a clearly described, step by step guide to assist other researchers who wish to employ our methods for proline hydroxylation analysis in their own studies.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      We agree that our study provides valuable information germane to the recent controversy in the field and the views published by Cockman et al., to the effect that HIF-α is the only physiologically relevant target for PHDs. We will carefully review our statements when preparing a suitably revised version of record with the aim of providing a balanced and objective discussion of this issue.

      Reviewer #3 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our study employs state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, along with their recognition that our study is, 'an advance compared to other similar approaches before.’ We also appreciate their reference to our companion study by Druker et al, in which we characterise the mechanism and biological role in regulation of mitotic progression of the hydroxylation of P604 in the target protein RepoMan (CDCA2), that is identified in this study.

      With respect to the Concern raised by reviewer 3:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses..

      We agree that this study, which has a focus on methodology and technical approaches for detecting sites of PHD- catalysed proline hydroxylation, cannot exhaustively validate the biological significance of all of the putative sites and targets identified. As the reviewer notes, we have performed a detailed functional characterisation of one such novel PHD-catalyed proline hydroxylation site, i.e. P604 in the protein RepoMan (CDCA2). This functional analysis is presented in the companion paper by Druker et al., which has also been reviewed by eLife and placed on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). We hope that publication of our identification of many new putative PHD target sites will encourage other researchers to pursue characterisation of their functional reoles in different biological mechanisms and have tried here to provide some degree of guidance to focus attention on the identification of those sites for which we currently have highest confidence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The investigators undertook detailed characterization of a previously proposed membrane targeting sequence (MTS), a short N-terminal peptide, of the bactofilin BacA in Caulobacter crescentus. Using light microscopy, single molecule tracking, liposome binding assays, and molecular dynamics simulations, they provide data to suggest that this sequence indeed does function in membrane targeting and further conclude that membrane targeting is required for polymerization. While the membrane association data are reasonably convincing, there are no direct assays to assess polymerization and some assays used lack proper controls as detailed below. Since the MTS isn't required for bactofilin polymerization in other bacterial homologues, showing that membrane binding facilitates polymerization would be a significant advance for the field.

      We agree that additional experiments were required to consolidate our results and conclusions. Please see below for a description of the new data included in the revised version of the manuscript.

      Major concerns

      (1) This work claims that the N-termina MTS domain of BacA is required for polymerization, but they do not provide sufficient evidence that the ∆2-8 mutant or any of the other MTS variants actually do not polymerize (or form higher order structures). Bactofilins are known to form filaments, bundles of filaments, and lattice sheets in vitro and bundles of filaments have been observed in cells. Whether puncta or diffuse labeling represents different polymerized states or filaments vs. monomers has not been established. Microscopy shows mis-localization away from the stalk, but resolution is limited. Further experiments using higher resolution microscopy and TEM of purified protein would prove that the MTS is required for polymerization.

      We do not propose that the MTS is directly involved in the polymerization process and state this more clearly now in the Results and Discussion sections of the revised manuscript. To address this point, we performed transmission electron microscopy studies comparing the polymerization behavior of wild-type and mutant BacA variants. The results clearly show that the MTS-free BacA variant (∆2-8) forms polymers that are indistinguishable from those formed by the wild-type protein, when purified from an E. coli overproduction strain (new Figure 1–figure supplement 1). This finding is consistent with structural work showing that bactofilin polymerization is exclusively mediated by the conserved bactofilin domain (Deng et al, Nat Microbiol, 2019). However, at native expression levels, BacA only accumulates to ~200 molecules per cell (Kühn et al, EMBO J, 2006). Under these conditions, the MTS-mediated increase in the local concentration of BacA at the membrane surface and, potentially, steric constraints imposed by membrane curvature, may facilitate the polymerization process. This hypothesis has now been stated more clearly in the Results and Discussion sections.

      For polymer-forming proteins, defined localized signals are typically interpreted as slow-moving or stationary polymeric complexes. A diffuse localization, by contrast, suggests that a protein exists in a monomeric or, at most, (small) oligomeric state in which it diffuses rapidly within the cell and is thus no longer detected as distinct foci by widefield microscopy. Our single-molecule data show that BacA variants that are no longer able to interact with the membrane (as verified by cell fractionation studies and in vitro liposome binding assays) have a high diffusion rate, similar to that measured for the non-polymerizing and non-membrane-bound F130R variant. These results demonstrate that a defect in membrane binding strongly reduces the ability of BacA to form polymeric assemblies. To support this hypothesis, we have now repeated all single-particle tracking experiments and included mVenus as a freely diffusible reference protein. Our data confirm that the mobilities of the ∆2-8 and F130R variants are similar and approach those of free mVenus, supporting the idea that the deficiency to interact with the membrane prevents the formation of extended polymeric structures (which should show much lower mobilities). To underscore the relevance of membrane binding for BacA assembly, we have now included a new experiment, in which we used the PbpC membrane anchor (PbpC<sub>1-132</sub>-mcherry) to restore the recruitment of the ∆2-8 variant to the membrane (Figure 9 and Figure 9–figure supplement 1). The results obtained show that the ∆2-8 variant transitions from a diffuse localization to polar foci upon overproduction of PbpC<sub>1-132</sub>-mcherry. The polymerization-impaired F130R variant, by contrast, remains evenly distributed throughout the cytoplasm under all conditions. These findings further support the idea that polymerization and membrane-association are mutually interdependent processes.

      (2) Liposome binding data would be strengthened with TEM images to show BacA binding to liposomes. From this experiment, gross polymerization structures of MTS variants could also be characterized.

      We do not have the possibility to perform cryo-electron microscopy studies of liposomes bound to BacA. However, the results of the cell fractionation and liposome sedimentation assays clearly support a critical role of the MTS in membrane binding.

      (3) The use of the BacA F130R mutant throughout the study to probe the effect of polymerization on membrane binding is concerning as there is no evidence showing that this variant cannot polymerize. Looking through the papers the authors referenced, there was no evidence of an identical mutation in BacA that was shown to be depolymerized or any discussion in this study of how the F130R mutation might to analogous to polymerization-deficient variants in other bactofilins mentioned in these references.

      Residue F130 in the C-terminal polymerization interface of BacA is conserved among bactofilin homologs, although its absolute position in the protein sequence may vary, depending on the length of the N-terminal unstructured tail. The papers cited in our manuscript show that an exchange of this conserved phenylalanine residue abolishes polymer formation. Nevertheless, we agree that it is important to verify the polymerization defect of the F130R variant in the system under study. We have now included size-exclusion chromatography data showing that BacA-F130R forms a low-molecular-weight complex, whereas the wild-type protein largely elutes in the exclusion volume, indicating the formation of large, polymeric species (new Figure 1–figure supplement 1). In addition, we performed transmission electron microscopy analyses of BacA-F130R, which verified the absence of larger oligomers (new Figure 1–figure supplement 2).

      (4) Microscopy shows that a BacA variant lacking the native MTS regains the ability to form puncta, albeit mis-localized, in the cell when fused to a heterologous MTS from MreB. While this swap suggests a link between puncta formation and membrane binding the relationship between puncta and polymerization has not been established (see comment 1).

      We show that a BacA variant lacking the MTS (∆2-8) regains the ability to form membrane-associated foci when fused to the MTS of MreB. By contrast, a similar variant that additionally carries the F130R exchange (preventing its polymerization) shows a diffuse cytoplasmic localization. In addition, we show that the F130R exchange leads to a loss of membrane binding and to a considerable increase in the mobility of the variants carrying the MTS of E. coli MreB. As described above, we now provide additional data demonstrating that elevated levels of the PbpC membrane anchor can reinstate polar localization for the ∆2-8 variant, whereas it fails to do so for the polymerization-deficient F130R variant (Figure 9 and Figure 9–figure supplement 1). Together, these results support the hypothesis that membrane association and polymerization act synergistically to establish localized bactofilin assemblies at the stalked cell pole.

      (5) The authors provide no primary data for single molecule tracking. There is no tracking mapped onto microscopy images to show membrane localization or lack of localization in MTS deletion/ variants. A known soluble protein (e.g. unfused mVenus) and a known membrane bound protein would serve as valuable controls to interpret the data presented. It also is unclear why the authors chose to report molecular dynamics as mean squared displacement rather than mean squared displacement per unit time, and the number of localizations is not indicated. Extrapolating from the graph in figure 4 D for example, it looks like WT BacA-mVenus would have a mobility of 0.5 (0.02/0.04) micrometers squared per second which is approaching diffusive behavior. Further justification/details of their analysis method is needed. It's also not clear how one should interpret the finding that several of the double point mutants show higher displacement than deleting the entire MTS. These experiments as they stand don't account for any other cause of molecular behavior change and assume that a decrease in movement is synonymous with membrane binding.

      We now provide additional information on the single-particle analysis. A new supplemental figure now shows a mapping of single-particle tracks onto the cells in which they were recorded for all proteins analyzed (Figure 2–figure supplement 1). Due to the small size of C. crescentus, it is difficult to clearly differentiate between membrane-associated and cytoplasmic protein species. However, overall, slow-diffusing particles tend to be localized to the cell periphery, supporting the idea that membrane-associated particles form larger assemblies (apart from diffusing more slowly due to their membrane association). In addition, we have included a movie that shows the single-particle diffusion dynamics of all proteins in representative cells (Figure 2-video 1). Finally, we have included a table that gives an overview of the number of cells and tracks analyzed for all proteins investigated (Supplementary file 1). Figure 2A and 4D show the mean squared displacement as a function of time, which makes it possible to assess whether the particles observed move by normal, Brownian diffusion (which is the case here). We repeated the entire single-particle tracking analysis to verify the data obtained previously and obtained very similar results. Among the different mutant proteins, only the K4E-K7E variant consistently shows a higher mobility than the MTS-free ∆2-8 variant, with MSD values similar to that of free mVenus. The underlying reason remains unclear. However, we believe that an in-depth analysis of this phenomenon is beyond the scope of this paper. We re-confirmed the integrity of the construct encoding the K4E/K7E variant by DNA sequencing and once again verified the size and stability of the fusion protein by Western blot analysis, excluding artifacts due to errors during cloning and strain construction.

      We agree that the single-molecule tracking data alone are certainly not sufficient to draw firm conclusions on the relationship between membrane binding and protein mobility. However, they are consistent with the results of our other in vivo and in vitro analyses, which together indicate a clear correlation between the mobility of BacA and its ability to interact with the membrane and polymerize (processes that promote each other synergistically).

      (6) The experiments that map the interaction surface between the N-terminal unstructured region of PbpC and a specific part of the BacA bactofilin domain seem distinct from the main focus of the paper and the data somewhat preliminary. While the PbpC side has been probed by orthogonal approaches (mutation with localization in cells and affinity in vitro), the BacA region side has only been suggested by the deuterium exchange experiment and needs some kind of validation.

      The results of the HDX analysis per se are not preliminary and clearly show a change in the solvent accessibility of backbone amides in the C-terminal region in the bactofilin domain in the presence of the PbpC<sub>1-13</sub> peptide. However, we agree that additional experiments would be required to verify the binding site suggested by these data. We agree that further research is required to precisely map and verify the PbpC binding site. However, as this is not the main focus of the paper, we would like to proceed without conducting further experiments in this area.

      We now provide additional data showing that elevated levels of the PbpC membrane anchor are able to recruit the MTS-free BacA variant (∆2-8) to the cytoplasmic membrane and stimulate its assembly at the stalked pole (Figure 9). These results now integrate Figure 8 more effectively into the overall theme of the paper.

      Reviewer #2 (Public review):

      Summary:

      The authors of this study investigated the membrane-binding properties of bactofilin A from Caulobacter crescentus, a classic model organism for bacterial cell biology. BacA was the progenitor of a family of cytoskeletal proteins that have been identified as ubiquitous structural components in bacteria, performing a range of cell biological functions. Association with the cell membrane is a common property of the bactofilins studied and is thought to be important for functionality. However, almost all bactofilins lack a transmembrane domain. While membrane association has been attributed to the unstructured N-terminus, experimental evidence had yet to be provided. As a result, the mode of membrane association and the underlying molecular mechanics remained elusive.

      Liu at al. analyze the membrane binding properties of BacA in detail and scrutinize molecular interactions using in-vivo, in-vitro and in-silico techniques. They show that few N-terminal amino acids are important for membrane association or proper localization and suggest that membrane association promotes polymerization. Bioinformatic analyses revealed conserved lineage-specific N-terminal motifs indicating a conserved role in protein localization. Using HDX analysis they also identify a potential interaction site with PbpC, a morphogenic cell wall synthase implicated in Caulobacter stalk synthesis. Complementary, they pinpoint the bactofilin-interacting region within the PbpC C-terminus, known to interact with bactofilin. They further show that BacA localization is independent of PbpC.

      Strengths:

      These data significantly advance the understanding of the membrane binding determinants of bactofilins and thus their function at the molecular level. The major strength of the comprehensive study is the combination of complementary in vivo, in vitro and bioinformatic/simulation approaches, the results of which are consistent.

      Thank you for this positive feedback.

      Weaknesses:

      The results are limited to protein localization and interaction, as there is no data on phenotypic effects. Therefore, the cell biological significance remains somewhat underrepresented.

      We agree that it is interesting to investigate the phenotypic effects caused by the reduced membrane binding activity of BacA variants with defects in the MTS. We have now included phenotypic analyses that shed light on the role of region C1 in the localization of PbpC and its function in stalk elongation under phosphate-limiting conditions (see below).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the missing estimation of biological relevance, some additional experiments may be carried out.

      For example, given that BacA localizes PbpC by direct interaction, one might expect an effect on stalk formation if BacA is unable to bind the membrane or to polymerize. The same applies to PbpC variants lacking the C1 region. As the mutant strains are available, these data are not difficult to obtain but would help to compare the effect of the deletions with previous data (e.g. Kühn et al.) even if the differences are small.

      We have now analyzed the effect of the removal of region C1 on the ability of mVenus-PbpC to promote stalk elongation in C. crescentus under phosphate starvation. Interestingly, our results show that the lack of the BacA-interaction motif impairs the recruitment of the fusion protein to the stalked pole, but it does not interfere with its stimulatory effect on stalk biogenesis. Thus, the polar localization of PbpC does not appear to be critical for its function in localized peptidoglycan synthesis at the stalk base. These results are now shown in Figure 8–Figure supplement 4. The results obtained may be explained by residual transient interactions of mVenus-PbpC with proteins other than BacA at the stalked pole. Notably, PbpC has also been implicated in the attachment of the stalk-specific protein StpX to components of the outer membrane at the stalk base. The polar localization of PbpC may therefore be primarily required to ensure proper StpX localization, consistent with previous work by Hughes et al. (Mol Microbiol, 2013) showing that StpX is partially mislocalized in a strain producing an N-terminally truncated PbpC variant that no longer localizes to the stalk base.

      We have also attempted to investigate the ability of the Δ2-8 and F130R variants of BacA-mVenus to promote stalk elongation under phosphate starvation. However, the levels of the WT, Δ2-8 and F130R proteins and their stabilities were dramatically different after prolonged incubation of the cells in phosphate-limited medium, so that it was not possible to draw any firm conclusions from the results obtained (not shown).

      In addition, the M23-like endopeptidase LdpA is proposed to be a client protein of BacA (in C. crescentus, Billini et al. 2018, and H. neptunium or R. rubrum, Pöhl et al. 2024). In H. neptunium, it is suggested that the interaction is mediated by a cytoplasmic peptide of LmdC reminiscent of PbpC. This should at least be commented on. It would be interesting to see, if LpdA in C. crescentus is also delocalized and if so, this could identify another client protein of BacA.

      We agree that it would be interesting to study the role of BacA in LdpA function. However, we have not yet succeeded in generating a stable fluorescent protein fusion to LdpA, which currently makes it impossible to study the interplay between these two proteins in vivo. The focus of the present paper is on the mode of interaction between bactofilins and the cytoplasmic membrane and on the mutual interdependence of membrane binding and bactofilin polymerization. Given that PbpC is so far the only verified interaction partner of BacA in C. crescentus, we would like to limit our analysis to this client protein.

      Further comments:

      L105: analyze --> analyzed

      Done.

      L169: Is there any reason why the MTS of E. coli MreB was doubled?

      Previous work has shown that two tandem copies of the N-terminal amphiphilic helix of E. coli MreB were required to partially target a heterologous fusion partner protein (GFP) to the cytoplasmic membrane of E. coli cells (Salje et al, 2011).

      Fig. S3:

      a) Please decide which tag was used (mNG or mVenus) and adapt the figure or legend accordingly.<br /> b) In the legend for panel (C), please describe how the relative amounts were calculated, as the fractions arithmetically cannot add to > 100%. I guess each band was densiometrically rated and independently normalized to the whole-cell signal?

      The fluorescent tag used was mNeonGreen, as indicated in the figure. We have now corrected the legend accordingly. Thank you for making us aware of the wrong labeling of the y-axis. We have now corrected the figure and describe the method used to calculate the plotted values in the legend.

      Legend of Fig 1b: It is not clear to me, to which part of panel B the somewhat cryptic LY... strain names belong. I suggest putting them either next to the images, to delete them, or at least to unify the layout (compare, e.g. to Fig S7). (I would delete the LY numbers and stay with the genes/mutations throughout. This is just a suggestion).

      These names indicate the strains analyzed in panel B, and we have now clarified this in the legend. It is more straightforward to label the images according to the mutations carried by the different strains. Nevertheless, we would like to keep the strain names in the legend, so that the material used for the analysis can be clearly identified.

      Fig. 2a: As some of the colors are difficult to distinguish, I suggest sorting the names in the legend within the graph according to the slope of the curves (e.g. K4E K7E (?) on top and WT being at the bottom).

      Thank you for this suggestion. We have now rearranged the labels as proposed.

      In the legend (L924), correct typo "panel C" to "panel B".

      Done.

      Fig. 3: In the legend, I suggest deleting the abbreviations "S" and "P" as they do not show up in the image. In line 929, I suggest adding: average "relative" amount... or even more precisely: "average relative signal intensities obtained..."

      We have removed the abbreviations and now state that the bars indicate the “average relative signal intensities” obtained for the different fractions.

      Fig 4d: same suggestion as for Fig. 2a.

      Done.

      Fig 8: In the legend (L978), delete 1x "the"

      Done.

      L258 and Fig. S5: The expression "To account for biases in the coverage of bacterial species" seems somewhat unclear. I suggest rephrasing and adding information from the M+M section here (e.g. from L593, if this is meant).

      We now state that this step in the analysis pipeline was performed “To avoid biases arising from the over-representation of certain bacterial species in UniProt”.

      I appreciate the outline of the workflow in panel (a) of Fig. S5. It would be even more useful when some more details about the applied criteria for filtering would be provided (e.g. concerning what is meant with "detailed taxonomic information" or "filter out closely related sequences". Does the latter mean that only one bactofilin sequence per species was used? (As quite many bacteria have more than one but similar bactofilins.)

      We removed sequences from species with unclear phylogeny (e.g. candidate species whose precise taxonomic position has not yet been determined). For many pathogenic species, numerous strains have been sequenced. To account for this bias, only one sequence from clusters of highly similar bactofilin sequences (>90% identity) was retained per species. This information has now been included in the diagram. It is true that many bacteria have more than one bactofilin homolog. However, the sequences of these proteins are typically quite different. For instance, the BacA and BacB from C. crescentus only share 52% identity. Therefore, our analysis does not systematically eliminate bactofilin paralogs that coexist in the same species.

      L281: Although likely, I am not sure if membrane binding has ever been shown for a bactofilin from these phyla. (See also L 380.) Is there an example? Otherwise, membrane binding may not be a property of these bactofilins.

      To our knowledge, the ability of bactofilins from these clades to interact with membranes has not been investigated to date. We agree that the absence of an MTS-like motif may indicate that they lack membrane binding activity, and we have now stated this possibility in the Results and Discussion.

      L285: See comment above concerning the M23-like peptidase LpdA. Although not yet directly shown for C. crescentus, it seems likely that BacACc does also localize this peptidase in addition to PbpC. I suggest rephrasing, e.g. "known" --> "shown"

      We now use the word “reported”.

      L295 and Fig S8: PbpC is ubiquitous. Which criteria/filters have been applied to select the shown sequences?

      C. crescentus PbpC is different from E. coli Pbp1C. It is characterized by distinctive, conserved N- and C-terminal tails and only found in C. crescentus and close relatives. The C. crescentus homolog of E. coli PbpC is called PbpZ (Yakhnina et al, J Bacteriol, 2013; Strobel et al, J Bacterol, 2014), whereas C. crescentus PbpC is related to E. coli PBP1A. We have now added this information to the text to avoid confusion.

      L311: may replace "assembly" by "polymerization"

      Done.

      L320: bactofilin --> bactofilin domain?

      Yes, this was supposed to read “bactofilin domain”. Thank you for spotting this issue.

      L324: The HDX analysis of BacA suggests that the exchange is slowed down in the presence of the PbpC peptide, which is indicative of a physical interaction between these two molecules. To corroborate the claim that BacA polymerization is critical for interaction with the peptide (resp. PbpC), this experiment should be carried out with the polymerization defective BacA version F130R.

      (Or tone this statement down, e.g. show --> suggest.)

      “suggest”

      L386: undergoes --> undergo

      Done.

      L391-400: This idea is tempting but the suggested mechanism then would be restricted to bactofilins of C. crescentus and close relatives. The bactofilin of Rhodomicrobium, for example, was shown to localize dynamically and not to stick to a positively curved membrane.

      In the vast majority of species investigated so far, bactofilins were found to associate with specifically curved membrane regions and to contribute to the establishment of membrane curvature. Unfortu­nately, the sequences of the three co-polymerizing bactofilin paralogs of R. vannielii DSM 166 studied by Richter et al (2023) have not been reported and the genome sequence of this strain is not publicly available. However, in related species with three bactofilin paralogs, only one paralog shows an MTS-like N-terminal peptide and another paralog typically contains an unusual cadherin-like domain of unknown function, as also reported for R. vannielii DSM 166. Therefore, the mechanism controlling the localization dynamics of bactofilins may be complex in the Rhodomicrobium lineage. Nevertheless, at native expression levels, the major bactofilin (BacA) of R. vannielii DSM 166 was shown to localize predominantly to the hyphal tips and the (incipient) bud necks, suggesting that regions of distinct membrane curvature could also play a role in its recruitment. We do not claim that all bactofilins recognize positive membrane curvature, which is clearly not the case. It rather appears as though the curvature preference of bactofilins varies depending on their specific function.

      L405-406: I agree that localization of BacA has been shown to be independent of PbpC. However, this does not generally preclude an effect on BacA localization by other "client" or interacting proteins. (See also comment above about the putative BacA interactor LpdA). I suggest either to corroborate or to change this statement from "client binding" to "PbpC binding".

      Thank you for pointing out the imprecision of this statement. We now conclude that “PbpC binding” is not critical for BacA assembly and positioning.

      Suppl. Fig. S11: In the legend, please correct the copy-paste mismatch (...VirB...).

      Done.

      L482: delete 1x "at"

      Done.

      L484: may be better "soluble and insoluble fractions"?

      We now describe the two fractions as “soluble and membrane-containing insoluble fractions” to make clear to all readers that membrane vesicles are found in the pellet after ultracentrifugation.

      L489-490: check spelling immunoglobulin – immuneglobulin

      Done.

      L500 and 504: º_C --> ºC

      Done.

      Suppl. file X (HDX data): please check the table headline, table should be included in Suppl. file 1

      We have now included a headline in this file (now Supplementary file 3).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is welljustified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the compromise in resolution, we have reported findings related to stretches or regions such as loops and stems, rather than individual nucleotides and interactions.  

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree with the lack of any additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. However, we have cited earlier reports that demonstrate the importance of these regions for overall IRES activity. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown in the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In our study, we observe that GCGA loop interacts with tRNA<sub>i</sub> in EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex. Further, we will attempt to address the concern of lack of experimental validation of GNRA and RAAA loops by performing biochemical assays.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing mapsharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We thank the reviewer for the suggestions, and we would employ suggested processes that may help improve the quality of the maps further. We will include image for rejected 2D classes in the revised manuscript. We agree with the Reviewer’s query related to the substantial number of micrographs and smaller number of particles for the final reconstruction. The total number of micrographs is the summation of multiple datasets, prepared and collected at various times. Among these, around 8000 micrographs have extremely poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous in the compiled dataset. We obtained only 237054 ‘good particles’ after multiple rounds of 2D & 3D classifications, and the final reconstruction has 28439 particles (~12%). This class was obtained after masked classification for IRES and ternary complex density. Hence, only the particles that show the best density for both IRES and ternary complex are used for reconstructing this map. Another set of particles that have only a portion of IRES and tRNA but NO density for eIF2 forms another map (26792 particles, 11.3%). Thus, we obtained a total of 55231 particles (23.3%) with IRES density.  

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES, however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence, and few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992. Hence, we used individual domains of EMCV IRES and predicted the tertiary structure independent of other IRES domains. Furthermore, 3D models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3 in Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our I domain apex model (EMCV IRES).

      c)  Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5 (Aichi virus), might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (e.g. Poliovirus, Coxsackie virus) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015).The Aichi viral IRES (type 5) harbours a GNRA loop in its longest domain, which is domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region either elevated the IRES activity or it remained unaltered (Yu et al 2011). We have hypothesized that these IRESs (type 1 and type 5) might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in type 5) similar to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop. Thus, GNRA can potentially mediate long-range interactions with tRNA<sub>i</sub> as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs also have similar placement of GNRA motif-containing domain before the eIF4G-binding domain (domain J-K in EMCV IRES, domain V in poliovirus, domain K in Aichi virus). Hence, we suggest the possibility of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC.  

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryoEM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down IRES-bound ribosomal complex.  

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. However, we would like to mention that we refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss about loops, RNA stretches or motifs that could be inferred with more confidence as shown in Manuscript Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details this interaction was unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of portion of domain I in the cryo-EM map. This is also reflected in the earlier reported SHAPE data (Supplementary figures 2, and 8 in Chamond et al 2014), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Figure 4b in Maloney and Joseph, 2024).

      Furthermore, this decrease in SHAPE reactivity pattern is also evident for FMDV IRES domain 3 apex (like domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018).

      Thus, these studies are consistent with the placement of IRES model in the cryo-EM map.

      We aim to improve the resolution of the maps for better clarity and add biochemical experiments to justify the possible interactions.

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the manuscript. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is much more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context. Furthermore, the synthesis of the polypeptide requires placement of AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We will mention this in the revised manuscript, as we had NOT mutated AUG-826.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and we wish to make further improvements to support our assessment of small stretches and individual nucleotides.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We are attempting to improve the resolution and complement the interpretations with biochemical experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data suggests that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, a few important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. Solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a VE, we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      In Experiment 3, during training, the participant is learning a novel rule (grammar), forming new expectations on how strings of letters should be. Afterwards, during testing, the participant is requested to identify if a novel string is following the rule or not. We examined sensitivity to distinguish between grammatical and non‐grammatical strings of letters, thus taking into account a baseline ability to identify expected strings. Additionally, both in the low‐similarity and highsimilarity conditions, there are expectations regarding whether the strings are following the rule or not. However, in the high‐similarity condition, there is more uncertainty regarding which strings are following the grammatical rule, as demonstrated in a lower sensitivity (d prime). Given the group differences only in the low similarity condition, these results suggest the CA group is impaired only when the rules are more certain. Given these results, we suggest that forming cognitive expectations is not necessarily dependent on the cerebellum. Rather, we propose that the cerebellum is critical for processing rule-based VE (detection or processing of detected errors) under conditions of more certainty. One remaining question for future studies is whether the cerebellum contributes to detection of a mismatch between the expectation and sensory evidence, or the processing of a detected VE. 

      We suggest that these key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Importantly, while previous experimental manipulations17,19,40,94–96 have provided important insights regarding the cerebellar role in these processes, some may have confounded these internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions, such as correct trials, where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation. 

      Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks. If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us to achieve a more detailed and well-defined cerebellar motor and non-motor mechanistic account.

      Recommendations for the authors:

      Editors comments:

      The Figures are somewhat sub-standard and should be improved before the paper is made the VOR. Ensure consistent ordering of the group factor (CA, NT) and experimental factor across Figure 3,4, and 6 (panels A). Having the patient group as columns in Figure 4a and in rows in Figure 6a is very confusing.

      We have standardized the layout across Figures 2, 4, and 6 so that the group factor (CA, NT) and experimental conditions are consistently ordered. In all panels, the group factor now appears as a column.

      Subpanels should be numbered A,B,C... not A, B1, B2.

      Subpanel labels have been updated to follow the standard A, B, C format across all figures.

      Fonts should have a 100% aspect ratio - they should not be stretched (Figure 6B).

      We have corrected the font aspect ratios in all figures (e.g., Figure 6B) to ensure proper proportions and readability. 

      Colors should be more suitable to print - use a CYMK color scheme (i.e. avoid neon colors such as the neon green for the CA).

      The color scheme across all figures has been revised to be print-friendly using CMYKcompatible, colorblind-accessible palettes. Neon green for the CA group was replaced with a more muted, distinguishable color.

      Abstract: "The CA group exhibited a disproportionate cost when comparing expected problems compared to unexpected problems" - I recommend switching unexpected and expected, as the disproportional cost in on the former.

      We have changed the wording of the sentence accordingly. 

      Upon re-reading the details for the AGL task were not clear to us. Please do not rely on the reference (78) for the details - your paper should contain enough information to have the reader understand the experimental details. For you to appreciate the depth of our not-understanding, here a simple question: The test strings either followed the grammar in Fig 5 or they did not. If they did not, how exactly was similarity to the grammar measured? If they did, what was the difference between the “Grammatical-high” and “Grammatical-low” trials? If the string was grammatical, there should not be a notion of similarity, no? Or where these trials arbitrary split in half? 

      We have clarified that 50% of the test strings followed the grammar of the training strings. We also elaborated on the calculation of chunk strength as a measure of similarity between the training and testing strings, similar to the previous papers. The differences between low and high similarity are explained in the paper. Specifically, for each test string, we calculated chunk strength by summing the frequencies of all relevant substrings (e.g., bigrams and trigrams) that appeared in the training set. The test strings whose chunk‐strength values fell above the median for grammatical items were classified as “high similarity,” while those falling below the median were classified as “low similarity.” Also, grammatical strings can be of both low and high similarity; this is precisely the beautiful aspect of this experimental manipulation, showing the importance of uncertainty. We have utilized a 2 × 2 fully orthogonal design (grammaticality × similarity).

      Experimental details of the task should be added to the Method section. In the results you should only mention the experimental details that are necessary for understanding the experiments, but details such as the number of trials, etc, can be moved to the methods. 

      We have now moved the experimental task details to the Method sections.

      Reviewer #1 (Recommendations for the author):

      Studies have been done online and not in the lab. Could that have affected the results?

      We addressed this in the Methods section, referring to established protocols for online neuropsychological testing[9–12]. Our results align with similar in-lab findings in both the subtraction and AGL tasks, supporting the online approach's robustness. 

      Figure 2, B1; Figure 4, B1; Figure 6B: How many patients performed worse than the (worst-performing) controls? There appears to be quite some overlap between patients and controls. In the patients who performed worse, was there any difference from the other patients (e.g. disease severity as assessed by SARA score, repeat length, data of attention probes)?

      We appreciate the reviewer’s thoughtful comment. We considered conducting individual-level comparisons to identify patients who performed worse than the lowest-performing controls. However, defining "worse" based on the performance of the lowest control is only one possible criterion. Other definitions—such as a specific number (1/2/3?) of standard deviations below the control mean—are also commonly used in literature, and each may yield different conclusions. This variability highlights the lack of a standardized threshold for what constitutes “worse” or "impaired" performance at the individual level. Given this ambiguity, and in line with prior studies that focus on average group differences rather than “impairment” prevalence, we chose not to include these individual-level comparisons. We believe this approach better aligns with the goals and design of the current study. That said, we agree that examining individual variability is important and may be more appropriate in future studies with larger samples so that percentage is a more robust measure. However, given the rarity of the disease, this would also be a challenge for future studies.  

      SARA ataxia scale does not include oculomotor function. In SCA6 oculomotor deficits are frequent, eg, downbeat nystagmus. Please include information on oculomotor dysfunction.

      We thank the reviewer for this important observation. While it is true that the SARA scale does not explicitly assess oculomotor function, our experimental design – in all three experiments – has control conditions that help account for general processing differences, including those that could arise from oculomotor deficits. These conditions, such as the correct trials and the complexity effects, allow us to isolate effects specifically related to the violation of expectation while minimizing the influence of broader performance factors, such as eye movement abnormalities. We also note that, while some patients can experience oculomotor symptoms such as downbeat nystagmus, none of our tasks required precise visual tracking or gaze shifts. In our experimental tasks, stimuli were centrally presented, and no visual tracking or saccadic responses were required. Moreover, the response time windows and stimulus durations (>2–5 s) were sufficient to mitigate the effects of delayed visual processing due to oculomotor impairment.

      Why was MoCA used and not the CCAS-Schmahmann scale to assess cognitive function?

      We selected the MoCA due to its broad clinical utility, time efficiency, and ability to detect mild cognitive impairment specifically in CA[101,102].  

      Were there any signs of depression in the patient group that could have affected the results?

      None of the patients had a clinical diagnosis of depression or were undergoing psychiatric treatment.  

      Additionally, the interaction between group and expectancy was insignificant when RT was the depended vaibale .." = variable

      This has been corrected to "variable" in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The terms 'unexpected' and 'expected' conditions are confusing. [...] Terming this 'violation of expectation' seems unnecessarily complicated to me. 

      We thank the reviewer for raising this important concern. We recognize that the terms "expected" and "unexpected" can be ambiguous without clarification, and that "violation of expectation" (VE) may initially appear unnecessary. Our choice to use VE terminology is grounded in an established theoretical framework that distinguishes between mere stimulus correctness and prediction mechanisms. Specifically, VE captures the internal processing of mismatches between anticipated and observed outcomes, which we believe is central to the cerebellar function under investigation. While simpler, technical alternatives (e.g., "correct" vs. "incorrect") could describe the stimuli, we find that VE more accurately reflects the mental constructs under study and is consistent with previous literature in both motor and cognitive domains. 

      Both tasks provide an error (or violation of expectation) that is non-informative and therefore unlikely to be used to update a forward model. The authors draw on motor literature to formulate a cognitive task where the presence of an error would engage the cerebellum and lead to longer reaction times in cerebellar patients. But in the motor domain, mismatch of sensory feedback and expectations would lead to an updating of the internal forward model. It seems unlikely to me in the arithmetic and alphabetic addition tasks that patients would update their internal model of addition according to an error presented at the end of each trial. If the error processed in these tasks will not lead to the updating of the internal forward model, can the authors discuss to what extent the cerebellum will be engaged similarly in these tasks, and what exactly connects cerebellar processing in these motor and cognitive tasks.

      We thank the reviewer for this thoughtful and important comment. We fully agree that the current tasks do not directly probe learning-related updating of internal models. As stated in the paper, the goal of the present study was not to support or refute a specific claim regarding the cerebellum’s role in learning processes. Rather, our focus was on examining cerebellar involvement in the processing of VE. While we were inspired by models from the motor domain, our design was not intended to induce learning or adaptation per se, but to isolate the processing of unexpected outcomes. We agree that the tasks in their current form are unlikely to engage forward model updating in the same way as in sensorimotor adaptation paradigms. That said, we believe the current findings can serve as a basis for future research exploring the relationship between cerebellar prediction error processing and learning over time. As we also noted in the paper, this is a direction we propose, and actively pursuing, in ongoing research work.

      The colour scheme is difficult for anyone with colour blindness or red-green visual impairment. Please adjust.

      All figures have been revised to use CMYK-compatible, colorblind-safe palettes, and neon colors have been removed.

      The introduction is a bit difficult to understand, because the authors draw on a number of different theories about cerebellar functioning, without clearly delineating how these relate to each other. For example: a) In the paragraph beginning with 'notably': If the cerebellum is required for sequential operations, why does it show the impairment with the rotation of the letters?

      We understand the concern that if the cerebellum is involved in sequential operations, its involvement in mental letter rotation, which can be assumed as “continuous transformation,” may appear contradictory. We note that the boundary between continuous and stepwise, procedural operations is not always clear-cut and may vary depending on the participant's strategy or previous knowledge, which is not fully known to the researchers. Furthermore, to our knowledge, prior work on mental rotation has not directly investigated the impact of VE during this task. However, these are two debatable considerations. 

      More importantly, a careful reading of our paper suggests that our experiments were designed to examine VE within tasks that involve sequential processing. Notably, we are not claiming that the cerebellum is involved in sequential or procedural processing per se. Rather, our findings point to a more specific role for the cerebellum in processing VE that arises during the construction of multistep procedural tasks. In fact, the results indicate that while the cerebellum may not be directly involved in the procedural process itself, it is critical when expectations are violated within such a context. This distinction is made possible in our study by the inclusion of a control condition (the complexity effect), which allows for a unique dissociation in our experimental design—one that, to our knowledge, has not been sufficiently addressed in previous studies.

      Additionally, in the case of arithmetic problem solving—such as the tasks used in prior studies cited in our manuscript21—there is substantial evidence that these problems are typically solved through stepwise, procedural operations. Arithmetic reasoning, used in Experiments 1 and 2, has been robustly associated with procedural, multi-step strategies, which may be more clearly aligned with traditional views of cerebellar involvement in sequential operations. Thus, we propose that the role of the cerebellum in continuous transformations should be further examined. 

      We suggest a more parsimonious theory that the cerebellum contributes to VE,  a field that was highly examined before. Yet, to reconcile ours and previous findings, we propose that the cerebellum’s contribution may not be limited to either continuous or stepwise operations per se, but rather to a domain-general process: the processing of VE. This theoretical framework can explain performance patterns across both mental rotation tasks and stepwise, procedural arithmetic.   

      The authors mention generation prediction as a function of the cerebellum, processing of prediction errors (or violations of expectations), sequentially, and continuous transformations - but it is unclear whether the authors are trying to dissociate these from each other or whether ALL of these functions have informed task design.

      We propose that the cerebellum’s contribution may not be limited to either continuous transformations or stepwise, procedural operations per se, but rather to a domain-general process: the processing of VE. We would like to clarify that we do not claim the cerebellum contributes to continuous transformations only, as suggested in some earlier work[21]. Rather, it could be that the cerebellum may contribute to continuous transformations, but we propose that it also supports multi-step, procedural processes. Given that framework, in the current study, across three separate experiments, we demonstrated that the cerebellum can also contribute to procedural, multi-step reasoning tasks.  

      Minor Comments

      Typo under paragraph beginning with 'notably' - cerebellum role should be cerebellar role.

      Corrected as suggested.

      When mentioning sequences as a recruiting feature for the cerebellum in the introduction, Van Overwalle's extensive work in the social domain should be referenced for completeness.

      Thank you for the suggestion. We have now cited Van Overwalle’s work on cerebellar involvement in sequence processing within the social domain in the revised Introduction.

    1. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles can reach those values. However, given that polyP is a potent chelator of divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not arbitrarily chosen. It is the luminal P<sub>i</sub> concentration that the vacuoles could reach through when they entered a plateau of luminal Pi. We consider this as an upper limit because polyP kept increasing which luminal P<sub>i</sub> did not. Thus, there is in principle no physiological motivation for trying higher values. But we will probably add a titration to the revised version.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160)

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      We will pick this up in the discussion. However, the situation is much more complex because major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We will address this situation in the revision.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field. 

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosome-like functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We will elaborate this point, which is obvious to us but probably not to most readers, in the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

      It is not clear to us what the reviewer would see as a more rigorous test of the model.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.

      We have carefully re-read the manuscript and reviewed it for inconsistencies. We made several corrections in the figures. For example, we removed redundant lines from violin plots and statistics, applied consistent labels, matched y- and x-limits of graphics, and adjusted labels. We also clarified descriptions of some experiment by adding explanations to the text.

      The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

      To address this and previous concerns about regional differences, the manuscript now includes 4 figures (4-1, 4-3, 6-2, 7-1) and 5 supplemental tables (3,4, 5, 6, 8) that explicitly compare results across brain regions.

      Following the reviewer’s request for subsequent analysis, we now added a new supplemental figure (Fig. S6-2) and two new supplementary tables (Tables S5, S6). We show that similar to expert mice (supplementary Table 3, and supplementary Table 4), the firing properties of adolescent and adult novice mice differ across auditory subregions (supplementary Table 5). We also show that the different auditory subregions have different firing properties (supplementary Table 6). With respect to task engagement, we show that (similar to Fig. S4-2) the neuronal discriminability in different auditory subregions is similar in both novice and expert mice (Fig. S6-2).

      Following the comment on Fig. S7-1, we made three changes to the revised manuscript. First, we now highlight that the differences firing properties between adolescent and adult neurons in AUDp and AUDv were distinct, but not significantly different within age-group comparisons. Second, we clearly state that the learning related changes in the measured parameters are different between AUDp and AUDv. Note, however, the greater changes in adult neurons after learning remains consistent between AUDp and AUDv. Third, we softened our original claim but still highlighted the stronger learning-induced plasticity in adults.

      Regarding the concern that different regions should show different patterns due to their known differences (e.g. tonotopy). Of course we agree that different areas differ functionally (as shown in our own previous work and here as well). However, it is still plausible, and biologically reasonable, that developmental changes may proceed in a similar direction across different areas, even if their baseline coding properties differ.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We have carefully re-read the manuscript and reviewed it for analyses that lacked a clear rationale or conclusion. To address this, we have made several changes to clarify the reasoning and strengthen the interpretation of the results.

      Reviewer #1 (Recommendations for the authors):

      It would have helped if the authors had highlighted the changes they made to the manuscript compared to the original version - especially since many replies to the reviewers' comments were as vague as "...we fixed some of the wording so it adheres to the data shown", or "we refined our interpretation", without further details.

      The revised version has improved substantially, and the main claims have been discussed in a more objective way. Important new analyses have been added to allow for a refined interpretation of the results. However, the presentation of the data could still be strengthened significantly (in response to comment A from last review).

      We apologize for the lack of detail in some of our previous responses. Our intention was to keep the replies concise, assuming that the side-by-side version with tracked changes would make the edits sufficiently clear. However, we understand the need for greater transparency. Thus, below we provide the following five lists describing the major changes: (1) List of specific reviewer recommendations, (2) list of corrections in figures, (3) list of clarity issues, (4) list of fixed mistakes, (5) list of new figures. We hope this breakdown makes the revisions clearer and more accessible.

      List of specific reviewer recommendations:

      l.108 mentions a significant change in the vertical line of Fig 1F - Could this significance be indicated and quantified in the figure?

      We quantified and indicated the significance of the vertical line in Fig. 1f and Fig. 1i.

      Fig.1G - the thick and thin lines should be defined, as well as the grey and white dots (same values for adolescents, not for adults).

      (a) We removed the thin inner lines from the violin plot. We define the bar (thick line) of the violin plot in an additional sentence in the methods section under data analysis (LL820-823). b) We adjusted the marker outlines in the adult data (Fig. 1G).

      the figure axis legends should be consistent (trails in Fig D vs # trails in Fig 1F)

      We adjusted the axis legend to # trials in Fig. 1D.

      l.110: is d' always calculated based on the 100 last trials of a session, or is it just for Figure 1F? -etc...

      d’ is always calculated based on the last 100 trials. To clarify this, we added a description in the methods section (L830).

      List of corrections in the figures:

      (1) We removed the internal lines from violin plots in throughout Fig. 1-7.

      (2) We removed the underline of the statistics throughout Fig. 1-7.

      (3) We consistently applied ‘adolescent’ and ‘adult’ figure labels and titles with lowercase letters throughout Fig. 1-7.

      (4) We applied consistent labelling of ‘time (ms)’ throughout Fig. 1-7.

      (5) We matched the size of dashed lines throughout Fig. 1-6.

      (6) We adjusted the x-label of Fig. 1d, Fig. S-1-1 a, Fig. 3c, Fig. 3h-i, Fig, 4d to ‘# trials’.

      (7) We removed the x-label of ‘Experimental Group’ from Fig. 1 to enhance consistency with other figures.

      (8) We removed misaligned dots from the violin plots in Fig. 1g, Fig. 2f, Fig. 3f,g.

      (9) We corrected the plot in Fig. S1-1b.

      (10) We adjusted the y-limits of Fig. S1-1c to be consistent with Fig. S1-1d,e.

      (11) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (12) We added the age of adolescent and adult mice to the schematic timeline in Fig. 2a.

      (13) We added a label of the reinforcement delay to the schematic trial structure in Fig. 3b.

      (14) We added within-group statistics to Fig. 3e and the figure legend.

      (15) We adjusted the x-label of Fig. 3d to ‘# sessions’.

      (16) We adjusted the x-label of Fig. 3d and Fig. S3-1b to ‘# licks’.

      (17) We changed the y-label in Fig. S3-1a, and Fig. S3-2d, e to ‘lick ratio’ to avoid confusion with the lick rate (Hz) that was calculated in Fig. 4 and Fig. 6.

      (18) We replaced the titles ‘CAMKII’ with ‘dTomato’ in Fig. S3-2 to correctly highlight that both the experimental and control injection were CAMKII injections.

      (19) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (20) We adjusted the y-label of Fig. S4-1c to ‘# neurons’.

      (21) We matched the x-ticks in Fig. 4e,f.

      (22) We matched the x-ticks in Fig. 6d-g.

      (23) We changed the x-label in Fig. 4g, S4-2 and S6-2 to ‘duration (ms)’ to match the figure label with the manuscript.

      (24) We consistently label ‘Hit’, ‘Miss’, ‘FA’ and ‘CR’ with capital letters in Fig. 4d-e.

      (25) We replaced the double figure label ‘C.’ in Fig. S4-2 with ‘D.’.

      (26) We adjusted the dot-size in Fig. 5 to be equal for all graphs.

      (27) We added ticks to the experimental timeline in Fig. 6a.

      (28) We corrected the y-label in Fig.7c. Now it correctly reflects 5 attenuations from 72-32 dB SPL.

      (29) We matched the y-label of Fig. 7e-h and Fig. S7-1.

      List of clarity issues:

      (1) We replaced the term ‘lower response bias’ with ‘higher lick bias’ (L24) to accurately describe the more negative (lower) criterion-bias, which highlights a higher tendency to lick.

      (2) We replaced the term ‘response bias’ with ‘lick bias’ to consistently describe the calculated criterion-bias (L24, L149, L164, L455, L456, L468).

      (3) We clarify that the age-related differences were ‘more pronounced’ instead of simply ‘higher’ to accurately reflect not simply the increase in adolescent lick-bias, but also the decrease in adult lick-bias (L31).

      (4) We clarified that adolescent sound representations are not merely ’distinct’, but ‘not fully mature’ in L83.

      (5) We clarified in L180 that the impulsive responses we observed in adolescent mice could be related to being ‘less impacted by punishments’.

      (6) We clarified the differences in firing properties of auditory sub-regions analyzed in Supplementary Table 3 (L287-295).

      (7) We explained and clarified the reference to Fig. 3j (LL252-253).

      (8) We added statistics to Fig.S4-2 to support our claim that there are no differences in the onset-latency, duration of discriminability and maximal discriminability between different sub-regions within age-groups (LL 314-315).

      (9) We expanded our explanation of the results in Table 3 (LL370-379).

      (10) We separated the reference to Fig. 6b and Fig. 6c to clarify their meaning (LL358-361).

      (11) We clarified the differences in basic firing properties during the FRA protocol in Fig. 7 (LL409-418).

      (12) We expanded our explanation of the differences of the learning related firing properties in AUDp and AUDv of Fig. S7-1 (LL426-433).

      (13) We changed the term ‘plasticity profiles’ to ‘learning related plasticity’ to further clarify our limitation that L5/6 and L2/3 may exhibit distinct learning related changes (L496).

      (14) We changed the term ‘sluggish’ (L481) to ‘delayed’ to more precisely explain differences between adolescent and adult tuning properties.

      (15) We clarified that the running d’ was calculated in bins of 25 trials, instead of ‘the last 25 trials’ (LL845-846).

      List of fixed mistakes:

      (1) We corrected and matched the age to more accurately reflect the age mice were recorded (P37-42 and P77-82).

      (2) We corrected the attenuation range from 72-42 to 72-32 dB SPL to correctly reflect the 5 attenuations used in the protocol.

      (3) We corrected the number of channels shown in the voltage trace from 10 to 11 (Fig. S4-1a)

      (4) We corrected the number of neurons recorded in novice adolescent mice in the legend of Fig. 6 from 140 to 130 (Fig. 6b).

      (5) We removed redundant, or double brackets, commas, dots, and semi-colons in the figure legends.

      (6) We corrected the LME statistics Table 2.

      List of new figures and tables:

      (1) We added a new supplementary figure to accompany Figure 6. Specifically, Fig. S6-2, shows the interaction of the three measured discriminability properties (onset delay, duration of discriminability, and maximal discriminability) in novice compared to expert mice in the easy and hard task (Go compared to No Go). The figure compares the different auditory sub-regions (similar to Fig. S4-2). We show that the discriminability properties within different groups is not significantly different among the four different sub-regions.

      (2) Supplementary Table 5: We compared the firing properties in different auditory subregions in novice mice, and found (similar to expert mice) that the firing properties differ between adult and adolescent mice across the four different sub-regions.

      (3) Supplementary Table 6: We compared the firing properties between different subregions, separately for adolescent and adult novice mice. Similar to expert mice, we found that different auditory subregions differ in their auditory firing properties.

      Reviewer #2 (Recommendations for the authors):

      The authors largely addressed my suggestions.

      Comparing hit vs correct rejection trials in the population decoding analysis (L313-314): The authors acknowledge that comparing these two trial types conflates choice and stimulus decoding but I am not convinced that the changes to the manuscript text make this clear enough to the reader.

      Thank you for pointing this out. We have made additional revisions to clarify this, and other issues more explicitly, as follows:

      (1) We have expanded the explanation of how our population decoding analysis conflates stimulus and choice, and we acknowledge the limitations of this approach in the Abstract (L28), the Results section (L324-326, LL367-370) and the Discussion (LL516-519).

      (2) We replaced the analysis of impulsivity on the head-fixed task. Instead of analyzing all it is, we focus only on ITIs following FA trials (Fig. S3-1c,d). This is more consistent with the analysis in the Educage (Fig. S2-1), where we show that adolescents exhibit increased impulsivity after FA trials. We found a similar result for ITIs following FA trials in the head-fixed task.

      (3) To provide complementary insight, we now further justify our use of the Fisher separation metric alongside decoding accuracy in Figure 5, with a clearer rationale provided in LL343-345

      (4) We also clarified our reasoning for focusing on 62 dB SPL in the FRA-based analysis in LL400-403.

    1. Author response:

      We thank the reviewers and editors for their careful and constructive assessment of our manuscript. We have provided a provisional response to the eLife assessment and the reviewer’s public comments below, addressing their main concerns and outlining our planned revisions that we believe will substantially strengthen our paper.  

      eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

      We plan to address both specific methodological weaknesses mentioned in the assessment in our forthcoming revision. First, the revision will include analyses of an early visual cortex ROI as an additional control region, allowing us to test whether the primary auditory cortex findings generalize to the sensory cortex across input modalities. Preliminary results indicate that the early visual cortex ROI exhibits a similar pattern of results, with evidence for coding both task-relevant and task-irrelevant visual dimensions across both tasks, as well as the context dimension specifically in the hierarchy task. Second, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks to mitigate concerns over performance-related confounds. In addition, we will include a set of control analyses that demonstrate that equating the amount of data for pattern analyses across the two tasks by subsampling from the hierarchy task, while reducing our overall power, does not appreciably alter our results. We note that our analyses of representational geometries relied only on neural data from correct trials and, in the first-level modelling of the fMRI data, already controlled for differences in trial-by-trial response times. Therefore, our analyses of decoding and representation similarity are not directly affected by differences in performance across the two tasks. Finally, we have provided clarifications regarding Reviewer 2’s questions about the size and construction of the regions of interest employed in the study, as well as about the language employed to discuss null results.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and taskirrelevant features is a signature of the strength of cognitive control?

      We appreciate the reviewers’ positive evaluation of our paper.

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

      To address the Reviewer’s concern about the difference in behavioural performance between the two tasks influencing our results, we will take several approaches. First, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks. This should ensure that any differences we observe across tasks are over and above those that can be explained by the difference in behavioral performance. Second, we will include a set of decoding analyses that control for differences in performance across the tasks. We note that all our analyses of representational geometries relied on neural data from correct trials only. In addition, the first-level modelling of the fMRI data already controlled for trial-by-trial variability in response times. Therefore, our decoding and representation similarity analyses should not directly be affected by differences in performance across the two tasks. However, one possible issue with this approach is that the larger number of errors in the flat task means that less data was available for estimating multivoxel patterns in the flat task compared to the hierarchy task, resulting in differential power to detect decoding effects across the two tasks. We note that the on average, this difference was not substantial: on average, 21.7 runs were available per participant for the flat task, while 23.8 runs per participant were available for the hierarchy task. Moreover, rerunning our analyses with the number of runs equated for each participant does not meaningfully alter the pattern of results. These additional analyses will be included in the supplement in the forthcoming revised manuscript.  

      Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:

      The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:

      The choice of analyses are largely well-suited towards the questions at hand - crosscondition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      We thank the reviewer for noting the strengths of the paper. We respond to the weaknesses noted below. 

      Weaknesses:

      (1) Choice of ROIs:

      A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      We agree with the reviewer that the whole-brain fMRI data certainly provide ample opportunities to explore the nature of these representations across the brain. Our focus in this paper is squarely on the principles of coding and flexibility in the lPFC. We believe that a whole-brain exploration addresses a separate question that would be out of the scope of this study. To clarify, we are not arguing that the lPFC is the only region in the brain that employs the coding principles that our study brings to light. Our contention is only that lPFC employs these principles, and it differs at least from the primary sensory cortex. The questions of whether these principles generalize beyond lPFC (quite likely) and, if so, how broadly, are distinct from the ones addressed in the manuscript. We intend to follow up with another manuscript that addresses these questions.

      Nevertheless, given the focus of this paper, we agree that a second control region, which allows one to test if the primary auditory cortex findings generalize to the sensory cortex more broadly, would strengthen our claims. We will include an early visual cortex ROI in our forthcoming revision. Preliminary results indicate that the early visual cortex ROI shows a similar set of findings – with evidence for coding of task-relevant and taskirrelevant visual dimensions across both tasks, but also specifically the context dimension in the hierarchy task. These results will be detailed in the forthcoming revision

      (2) Construction of ROIs:

      The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      We defined the lPFC ROIs by selecting Schaefer parcels in the frontal lobe that were previously mapped onto the Control A resting state network identified by Yeo et al. (2011). This network aligns with the multiple-demand network, which has also been identified in the macaque, where it includes the lPFC regions that abut the principal sulcus. Prior results from these regions in the monkey brain provide the scientific premise for our hypotheses. The two lPFC ROIs in each hemisphere were constructed out of 5 Schaefer parcels in each hemisphere. These parcels cluster into the same functional network and tend to behave similarly in univariate analyses. Given that our hypotheses do not distinguish between the different parcels, we elected to improve power by merging them into left and right dlPFC ROIs. 

      On the other hand, the same approach could not be used to identify the primary auditory cortex. As Yeo et al. noted in their paper, the 17 resting state networks they identify did not adequately parcellate somatomotor and auditory cortices into distinct networks, likely due to their proximity (see Fig 14 and related text in Yeo et al. (2011)). We therefore relied on a different approach to define the primary auditory cortex, using an association test in Neurosynth to obtain a map of regions associated with the term “primary auditory”. In the revised manuscript, we will also include a primary auditory cortex ROI, defined again using a term-based association test in Neurosynth.

      Our lPFC ROIs and pAC ROIs are of similar size. In the left hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 128-thru-132) has, on average, 624.55 voxels. The left pAC ROI (defined with Neurosynth) has, on average, 628 voxels. In the right hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 330-thru334), has 470.8 voxels on average. The right pAC ROI has, on average, 568 voxels. A table reporting the size of our parcels and ROIs was included in the supplement. In our forthcoming revision, we will additionally include a supplementary figure visualizing the ROI masks. 

      (3) Task dimensionality:

      In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      We thank the reviewer for this comment, which gives us a chance to clarify our argument.

      As noted by the reviewer, whether a network takes advantage of the compressibility of a task depends on its learning regime (i.e. rich vs lazy). One way to frame our question regarding the lPFC’s coding strategy, then, is to ask whether it operates in a rich or a lazy learning regime (which would predict, respectively, task-tailored vs task-agnostic representations). The reviewer’s concern is that the two task structures we employed are differentially compressible, and therefore, it is inevitable that we observe tailored representations and therefore, our hypotheses are not falsifiable.

      First, it is important to clarify the theoretical premise behind our design and how it relates logically to our hypotheses. Under a lazy learning regime, a network would encode highdimensional representations of both tasks, regardless of their compressibility. On the other hand, under a rich learning regime, representational dimensionality will likely be shaped by the tasks’ structure. If the two tasks differ in their compressibility, only in the rich learning regime would the network learn representations of different dimensionality. Therefore, observing representations with dimensionality tailored to the task structure rules out the possibility that the lPFC is operating in a lazy regime. Therefore, the hypotheses are certainly testable.

      The second point of clarification is that, contrary to the reviewer’s assertion, the flat task is, in fact, compressible – the task can be solved with a categorical representation of the response categories, with no sensitivity to the different specific stimuli within each category. Indeed, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer, demonstrating this compressibility. While we agree with the reviewer that in the space of all possible architectures one might consider the two tasks may differ in compressibility, particularly at the local levels, as we noted above, this does not imply that our hypotheses are not testable.

      Finally, as a third point of clarification, our focus in this paper is on understanding the nature of coding in the lPFC in particular. Arguments based on a normative modelling perspective properly apply to the representations learned by an agent (such as an ANN or a human) as a whole. In a minimal feedforward ANN with a single hidden layer trained in a regime which encourages compression (i.e. a rich learning regime), it would indeed be the case that the representational dimensionality in that hidden layer would be higher for less compressible tasks. However, when applied to humans, such an argument applies to the brain as a whole rather than to an individual region of the brain like the lPFC. As such, it is less straightforward to predict how a single region might represent a task without additional information about the region’s inputs, outputs and broader position in a network. Even for a highly compressible task, a particular brain region may nevertheless be sensitive to all task dimensions. Conversely, even when a task is not compressible, a particular population within the brain may be invariant to some task features. For example, the primary auditory cortex is expected to be invariant to visual task dimensions.

      Therefore, how a task is represented in the lPFC in particular (as opposed to the whole brain) depends on its computational function and coding principles, which remain debated. For instance, as some accounts (such as the guided activation theory) posit, if the primary function of the lPFC is to encode ‘context’ and shape downstream processing based on context, we might only expect to see the abstract coding of the auditory context in the hierarchy task (and, perhaps, the response categories across both tasks as they encode the ’context’ for the lower-level response decision), while being invariant to lowerlevel features of the input. In our paper, we specifically contrast two accounts of lPFC coding that have emerged in the literature – one positing that the lPFC learns a representation tailored to the structure of the task, and another that the lPFC encodes a high-dimensional representation that privileges sensitivity to many task features and their non-linear mixture at the cost of generalization. Regardless of the compressibility of the tasks in question, how the lPFC encodes the two tasks is an empirical question.

      In our forthcoming revision, we will clarify these points in the discussion. We will also include the results of neural network simulations alluded to above.

      (4) Related to the above:

      The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      As explained above, there is no need for high local separability in the flat task. The lPFC could have completely abstracted over the individual trial-types that contributed to each response category, encoding only the response categories. Indeed, as also noted above, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer. The two hidden layer units code for each of the two response categories. 

      (5) Statistical inferences:

      Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

      We thank the reviewer for pointing these out. These sentences will be corrected in the revision. For instance, the sentence above will be modified to “Therefore, we find no evidence that the overall separability of pAC representations is shaped by either taskrelevance or task structure.”

      Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different highlevel stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

      We thank the reviewer for highlighting the strengths of our paper and their feedback on the writing. We have reviewed these helpful suggestions with an eye to which we may implement in our revision to improve clarity. Our forthcoming revision will 1) provide clearer scaffolding to aid the reader in understanding our design, analyses and our interpretation of the results 2) incorporate the MDS-based visualization of the representational geometries, which is currently presented in the Supplement, as a figure panel in the main text, 3) provide a justification for the particular task structures we picked in the introduction and 4) incorporate a new paragraph in the Discussion section to highlight the implications of our findings for cognitive control.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate. 

      Strengths 

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. 

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results. 

      Thank you very much for this positive evaluation of our work.

      Weaknesses 

      Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well. 

      Thank you for this fair and valuable feedback. Following also the suggestion by the Editor, we have now removed the rise-time kinetic fitting results from the manuscript and only retain the bi-exponential decay time constant values. Further, we explicitly detail the issues with kinetic fitting, and state that the precise quantitative conclusions should not be drawn from the differences in kinetic parameters (pages 7 and 2728). 

      We have included the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & B, Fig. 3C & D, Fig. 4C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig. 3E & F, Fig.5E, and Fig. 8B&C, we have included the results of an unpaired t-test. We have also included the t-test statistics information in the respective figure legends in the revised version.

      In Figure 8, we have shown example fluorescence traces from two different cells at the bottom of the A panel, and example traces from different ribbons of RBC a in the D, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements. 

      Yes, we do believe that the high-affinity indicator is partially saturated, and therefore, the measurement with the low-affinity indicator dye is a more accurate reflection of the measured Ca<sup>2+</sup> signal. We now state this more explicitly in the text. Further, we note that the rise time values are no longer listed due to lack of statistical significance for such comparisons, as noted above.

      Reviewer #2 (Public review): 

      Summary: 

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal. 

      Strengths: 

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM. 

      Thank you very much for this positive evaluation of our work.

      Comments on revisions: 

      Specific minor comments: 

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand. 

      Thank you for pointing that out. We have updated the final sentence of the Abstract.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).  

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.  

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.  

      Thank you for the valuable suggestion. We have now provided this information in the introduction and discussion.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-OffBCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......". 

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites. 

      Strengths: 

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging. 

      Thank you very much for this appreciation.

      Weaknesses: 

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results. 

      We acknowledge the reviewer’s point and believe the peptides and genetic approaches to measure local calcium signals have their merits, each with separate advantages and disadvantages.  

      Reviewer #1 (Recommendations for the authors): 

      The revisions helped with some concerns about the original paper, but some issues were not adequately addressed. I have left two primary concerns in my public review. To summarize those: 

      The difference in kinetics of proximal and distal locations is emphasized and quantified in the paper, but the quantification consists of a fit to the average responses. This does not give an idea of whether the difference observed is significant or not. Without an estimate of the error across measurements the difference in kinetic quoted is not interpretable. 

      Thank you for this feedback. Since the kinetics information is a minor part of the manuscript, we have followed the Editor’s advice to significantly tone down the comparison of kinetic fit parameters (completely removing the rise-time comparisons), in order to put more focus on the better-documented conclusions. We also note that we did establish statistical significance of the differences in fluorescence signal amplitudes. 

      Somewhat relatedly, the difference in amplitude and kinetics of the calcium signals measured with low and high affinity indicators is quite concerning. The authors added one sentence stating that the high affinity indicator might be saturated. This is not adequate. Should we distrust the measurements using the high affinity indicator? The differences between the results using the low and high affinity indicators is in some cases large - e.g. larger than the differences cited as a key result between distal and proximal locations. This issue needs to be dealt with directly in the paper. 

      Thank you for this feedback. Yes, the measurements from high-affinity indicators cannot report the Ca2+ as accurately as low-affinity indicators. However, the value of HA indicators is in their ability to detect lowamplitude signals that lower-affinity indicators may miss due to lower signal-to-noise resolution.  We added a sentence on page 12 to further stress this point.

      Related to the point about statistics, it is not clear how to related the horizontal lines in Figure 8 to the actual measurements. It is critical for the evaluation of the conclusions from that figure to understand what is plotted and what the error bars are on the plotted data. 

      We apologize for the earlier ambiguity in Fig. 8. In this figure, we first compare proximal (panel B) and distal (panel C) calcium signals across several RBCs, labeled RBC-a through RBC-d. Each RBC contains multiple ribbons, and for each cell, we present the average calcium signals from multiple ribbons using box plots in panels B and C. In these box plots, the horizontal lines represent the average calcium signal for each cell, while the size of the error bars reflects the variability in proximal and distal calcium signals among the ribbons within that RBC.

      For example, RBC-a had five identifiable ribbons. In panels D–F, we use RBC-a to illustrate the variability in calcium signals across individual ribbons. Specifically, we distinguished proximal and distal calcium signals from five ribbons (ribbons 1–5) within RBC-a. When feasible, we acquired multiple x–t line scans at a single ribbon, shown now as individual data points, to assess variability in calcium signals recorded from the same ribbon.

      The box plots in panels E and F display the average calcium signal (horizontal lines) for each ribbon, based on multiple recordings. These plots demonstrate considerable variability between ribbons of RBC-a. Importantly, the lack of or minimal error bars for repeated measurements at the same ribbon indicates that the proximal and distal calcium signals are consistent within a ribbon. These findings emphasize that the observed variability among ribbons and among cells reflects true biological heterogeneity in local calcium domains, rather than experimental noise.

    1. Author response:

      We thank all three anonymous reviewers for their thoughtful evaluations of our manuscript and for recognizing the conceptual advance in combining agent-based behavioral simulations with systems neuroscience models. We are especially encouraged by the acknowledgement of the framework’s potential to support simulation of neural control of individual animal behavior in realistic sensory environments.

      Below, we respond to each reviewer’s public comments in turn. Throughout, we have aimed to clarify our rationale for modeling choices, acknowledge limitations, and outline concrete steps for improvement in the revised manuscript.

      Furthermore, the call for a better description of the model implementation as voiced by all three reviewers and additional requests from community members has prompted us to formulate a separate technically detailed description of the publicly available larvaworld software package as well as of the readily implemented models in form of a preprint paper (Sakagiannis et al., 2025, bioRxiv, DOI: https://doi.org/10.1101/2025.06.15.659765).

      Reviewer #1:

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2:

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3:

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI: https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Comment 

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSC's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      We thank that reviewer for this positive assessment of our work.

      Comment

      NeuroSC provides an accessible and convenient platform. However, many of the characteristics of NeuroSC overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSC will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      We have added new language to more explicitly state the motivations for developing NeuroSC (Introduction, lines 98-111, and discussion lines 375-384). In a new discussion section, we also include comparisons of the features of NeuroSC with other existing tools, like Neuroglancer and Webknossos, (lines 393-417).

      Briefly, the functional features of NeuroSC are substantially different (and do not exist) in other web-based tools for navigating EM datasets, including NeuroGlancer. This is because the intended use of NeuroSC is substantially different (and purposefully synergistic) to the intended use, and tools available, in NeuroGlancer. 

      NeuroGlancer is a versatile tool designed primarily for web-based visualizations and sharing of large EM datasets. NeuroSC was not designed to enable this type of access to the primary EM data (purposefully done because these features were already available through tools like NeuroGlancer). 

      Instead, the explicit goal of NeuroSC is to provide a platform specifically optimized for examining neuronal relationships across connectomic datasets. NeuroSC builds on the segmentations emerging from programs like NeuroGlancer, but the tools are tailored to explore relationships such as contact profiles in the context of neuronal morphologies and synaptic positions, and across datasets that represent different animals or different developmental stages. 

      To achieve this, all datasets in NeuroSC were optimized to facilitate comparisons across different connectomes of segmented neuronal features, including: 1) alignment of the neurons that are compared upon the display of the segmentations; 2) synchronization of the 3D windows; 3) implementation of a ‘universal color code’ across datasets for each neuron and relationship for easy visual comparisons; 4) use of the specific neuronal names to label instances of the same cells across all available datasets. The use of precise neuronal names among separate data sets allows integration of these objects with other catalogued datasets, including genomic and neuronal activity profiles.

      The formatting and display of the datasets used in NeuroSC was accompanied by the development of new tools including: 1) Rendering of the contact profiles of all neurons in the context of the morphology of the cell and the synapses and 2) C-PHATE diagrams to inspect multidimensional relationship hierarchies based on these contact profiles. In NeuroSC, C-PHATEs can be navigated and compared across multiple stages of development while visualizing neuronal reconstructions, allowing users to compare neuronal relationships across individual datasets.

      We agree with the reviewer that these tools are most useful when integrated. With that intention in mind, we designed NeuroSC as a series of modular, open-source tools that could be integrated into other programs, including Neuroglancer. In that sense our intent was not to produce another free-standing tool, but a set of tools that, if useful, could be integrated to other existing web-based connectomic resources to enhance the user experience of navigating complex EM datasets and draw biological meaning from the relationships between the neurons. Additionally, we intentionally designed NeuroSC to enable the ability to integrate new methods of understanding neuron relationships as they arise. We have dedicated a more detailed section to the discussion (lines 369- 417) to better convey this intention and directly address the unique abilities of NeuroSC as a complementary tool to the powerful existing tools, including Neuroglancer.

      Comment

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      [what do they even mean by linking?]

      We thank the reviewer for this insightful comment and have implemented several improvements to address these suggestions. Specifically, we have added new features to enhance user access to quantitative data within the NeuroDevSCAN viewer:

      Cell, Patch, and Synapse Statistics: Users can now see a statistics panel when clicking on a rendered neuron, contact patch, or a synapse. These panels provide the following information, respectively, and are highlighted in lines 303-315):

      Cell Stats: Click on a cell rendering to show cell stats which displays the total volume and surface area of the selected neuron within the defined neuropil area of our datasets (see Methods). 

      Contact Stats: Click on a patch rendering to show ‘contact stats’. This pop up displays quantifications of the selected contact relationship. Rank compares the summed surface area of contacts ("patches") between these two neurons relative to all other contact relationships for the primary neuron for the cell and the whole nerve ring. A rank of 1, for example, means this neuron pair shares the largest contact surface area of the examined relationship. “Total surface area” is displayed in nanometers, and is the summed surface area of all patches of this identity. Contact percentages are presented in two ways: (1) as the proportion of the primary cell's total surface area occupied by the contact in question, and (2) as the proportion of the total surface area of the nerve ring occupied by that same contact. (Showcased in figure S5). 

      Synapse Stats: A click on a synapse rendering now shows ‘synapse stats’, which displays the number of synapses of the selected identity within the primary neuron, including any polyadic synapse combinations involving the primary neurons. (Showcased in figure S7).

      (1) Grouping and Readability Improvements: While individual synapses are still visualized, their display has been improved for legibility. We have condensed the lengthy naming scheme to improve clarity and codified the synapse type by using superscript letters C, E, U to represent chemical, electrical and undefined synapses, respectively. This is explained and shown in figure S7, we added arrows to indicate the directionality of presumed information flow at each synapse. 

      (2) Developmental Linkage: We can link objects across datasets via cellular identity, but each synapse in the dataset does not yet have an identity attributed to its spatial coordinates, preventing us from linking specific synapses across development beyond their connectivity (ie, that a given synapses connects cell X to cell Y, for instance), also addressed in R1.11.  

      Together, these improvements substantially enhance the utility of the viewer for hypothesis generation by making key quantitative data readily accessible.

      Comment

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSC in generating future hypotheses.

      There are several ways to visualize DC outputs, and one way to quantitatively compare DC clustering events of neurons is via Sankey diagrams. To make the inclusion of these resources more clear, we have highlighted them in lines 175-178 (Supplemental Tables 3-6). ‘DC outputs for each strata across animals can also be inspected using Sankey diagrams (Supplemental Tables 3-6). These spreadsheets detail the neuron members at each iteration of DC, allowing the user to derive quantitative comparisons of clustering events.’

      As the reviewer points out, DC is a deterministic algorithm that will iteratively cluster neurons based on the similarity of their contact profiles. To better explain the selection process for the threshold, the number of DC iterations and the quantitative metrics between the neurons, we have added new text in the Diffusion Condensation methods section.  Briefly:

      Number of DC iterations: During diffusion Condensation (DC) we track the modularity of the resulting clusters at each iteration and select the iteration with the highest modularity to define the clusters that represent the strata  (Moyle et al., 2021), (Brugnone et al., 2019). Mathematically, modularity is calculated by comparing the actual number of edges within clusters to the expected number of such edges in a randomized network with the same degree distribution (Newman et al., 2006). A higher modularity value implies that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. We now better explain this in lines 562-567.

      Threshold for merging points: The threshold (epsilon) used to merge data points in each iteration is set as a small fraction of the spatial extent of the data: for each coordinate dimension (x, y, z), we compute the range (maximum minus minimum), take the maximum of these three values, and divide it by 10,000. This process is performed iteratively for each round of clustering until all data points cluster into a single point. We have updated the manuscript to clarify this threshold selection and included this information in the revised algorithm description and pseudocode. We now better explain this in lines 556-559.

      Distances between neurons in DC C-PHATE: In our previous description in Box 1 algorithm 1, we had provided a general algorithm for DC for any high dimensional dataset. We have now revised the algorithm to indicate how we used DC for these EM datasets. 

      Distances between neurons are determined by the pixel overlap between their segmented shapes in the EM dataset. We use these distances to build a graph with weighted edges, in which the weight of the edge represents the pixel overlap (the adjacency in the actual EM segmentation). Affinities between neurons, which are a proxy for their distance in the graph, are then computed as now revised in Box 1, Algorithm 1. This process is done iteratively as neurons cluster. To better communicate this, we have changed the text in lines 533-538.  

      Comment

      R1.5. While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSC platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

      We now better explain in the manuscript that the selected case study, of the AVF neuron outgrowth, is not one of just correlation based solely on an EM dataset. Instead, the case study represents the NeuroSC-driven exploration of a biologically significant event supported by several independent datasets, as now explained in lines 257-276.

      Briefly, we agree with the reviewer that examining differences across individual EM datasets is insufficient evidence to make conclusions about developmental changes. But the strength of NeuroSC is in its ability to combine and compare multiple datasets, bolstering observations that are not possible by looking at just one dataset, and providing new insights on the way to new hypotheses. We now better explain that we are not looking at single connectomes in isolation and then deriving conclusions, but instead using NeuroSC to compare across 9 EM datasets. We better explain how the tools in NeuroSC, including C-PHATE, enabled comparisons across these multiple connectomes to identify apparent differences in neuronal relationships. We then explain that by using NeuroSC, we could examine these variations in neuronal relationships at the level of individual, cell biological differences of neuronal morphologies between the developmental datasets. This could be due, as pointed by the reviewer, to differences due to development, or just differences between individual animals. In the case of AVF, that features are absent in all early specimens, then arise and persist in all specimens after a certain time point, which lead us to hypothesize they result from a developmental event. Because the segmented objects in NeuroSC are linked to neuronal identities, we are also able to cross reference our observations from the EM datasets with information in other datasets and the literature. In the specific case of postembryonic development of AVF outgrowth, we can now tie the knowledge, from developmental lineage information and molecular profiles, that AVF is a postembryonically born neuron (Sulston et al. 1977, Sun et al 2022, Poole et al 2024, wormatlas.org) to the outgrowth dynamics of its neurites using the postembryonic EM datasets. Our findings using  NeuroSC provide a proof of concept of the utility of the resource and extended our understanding of how the outgrowth of this neuron affects the relationships between the neural circuits in the nerve ring.

      Comment

      R1.6. Given that recent studies have also quantified contact area between neurons across multiple connectomes (Cook et al., Current Biology, 2023; Yim et al., Nature Communications, 2024), and that the authors use a slightly different approach to quantify contact area, a direct comparison between contact area values obtained in this study with prior studies seems appropriate.

      We acknowledge that there are multiple different approaches to calculate adjacencies. In the papers cited above, there are 3 different algorithms used:

      (1) Brittin 2019 (python parse Track EM, boundary thresholds), used in Cook et al 2023, Moyle 2021, and this study).

      (2) Witvliet 2021 (Matlab 2D masks), used in Cook et al 2023.

      (3) Yim 2024 (3D masks), used in Yim et al 2024.

      To briefly describe the different approaches, and the methods we chose for this paper:

      Algorithm 1 (used in this study) defines adjacency based on distances between boundary points in TrakEM2 segmentations, allowing threshold tuning to accommodate differences in resolution and image quality across datasets—an important feature for consistent cross-dataset comparisons.

      Algorithm 2 infers contact via morphological dilation of VAST segmentations, identifying adjacency through overlapping expanded boundaries. 

      Algorithm 3 uses voxelwise contact detection with directional surface area measurements and normalization to account for dataset size differences. 

      In NeuroSC, we use algorithm 1, mostly because we had tested the rigor of this method in (Moyle et al. 2021), where we have shown that results were robust across a range of thresholds. This flexibility enables tailored application across datasets of varying quality and scale, critical for NeuroSC’s mission of curating data sets across differing methodologies to allow for direct relationship comparisons. We detail the methodology for defining thresholds for each dataset in methods section lines 492-521, defined in Supplementary table 1. Another difference between our analysis and the previously cited work is that for our analysis we also chose to include all individually resolved neurons, including post-embryonic cells, without collapsing them into left/right or dorsal/ventral symmetry classes. In this way our approach retains the full cellular resolution of the nervous system. 

      Comment

      Neuroglancer is not mentioned at all in the manuscript, despite it being a very similar and widely accepted platform for vEM data visualization across model organisms. An explicit comparison of NeuroSC and Neuroglancer would be appropriate, given the similarity of the tools. Currently, published C. elegans data (Witvliet et al., 2021; Yim et al., 2024) use Neuroglancer-based viewers, and directly comparing NeuroSC and highlighting its strengths relative to Neuroglancer would strengthen the paper.

      In the original manuscript we had not mentioned tools like Neuroglancer because we envisioned them as distinct, in intended use and output, from NeuroSC. But, as explained in R1.2 comment, in the revised version we have included a section in the Introduction lines 98-108 and in the Discussion (lines 369- 417) that compares these types of web-based tools and highlights synergies. 

      Comment

      Assigning shorthand names to strata, such as "shallow reflex circuit" (page 4, line 172), may oversimplify this group of neurons. Either more detailed support for shorthand names of C-PHATE modules should be included, or less speculative names for strata should be used.

      We appreciate this comment and understand that the original language used in the manuscript to describe strata categorizations may run the risk of oversimplification. We have now clarified the text to communicate that: 1) Strata are labeled by numbers (Strata 1, Strata 2, Strata 3 and Strata 4), rather than functional features of the neurons forming part of the strata, and that 2) the assignment of ‘strata’ is just one level of classification available via DC/CPHATE (as explained below). 

      To be sure, we have observed and published (Moyle et. al. Nature 2021) that within a given stratum, many neurons share the functional identities that we have used as summary descriptors for the strata (eg, shallow reflex circuits for Stratum 1; sensory and integrative circuits in Strata 3 and Strata 4; command interneurons in Strata 2, etc). However, those cell types are not the only members of the strata. We have adjusted the language in lines 197-204 to reflect this more clearly. “Stratum 1, which contains most neurons contributing to shallow reflex circuits that control aversive head movements in response to noxious stimuli, displayed the fewest changes among the developmental connectomes (Figure 3B–F; Supplementary Table 3). In contrast, C. elegans exhibit tractable behaviors that adapt to changing environmental conditions (Flavell et al., 2020). Strata 3 and 4 contain most neurons involved in circuits associated with such learned behaviors, including mechano- and thermo-sensation. This is reflected in Strata 3 and 4 showing the most change in neuronal relationships across postembryonic development.“

      Comment

      The authors state that NeuroSC can be applied to other model organisms. Since model organisms with greater neuron numbers include more individual neurons per cell class, the authors should support this by quantitatively demonstrating how DC/C-PHATE relationships correlate with shared functional roles among C. elegans neurons.

      We now clarify in the manuscript that, like in other organisms, C. elegans neurons are also grouped into functional classes with shared characteristics. In the context of the cylindrical nerve ring of the animal, these neuronal classes are sometimes bilaterally symmetric (forming left-right pairs), four-fold symmetric and six-fold symmetric. We now explain in the discussion that the DC/CPHATE analyses group these neuron classes and their relationships (lines 442-451). In the specific section mentioned by the reviewer, we now also add new text to contextualize this concept and how it might relate to the possible use of these tools in organisms with larger nervous systems: ‘However, our previous work has demonstrated that DC/CPHATE clustering of C. elegans neurons consistently pulls out clusters of shared neuron classes and shared functional roles Moyle et al. (2021). Building on this foundation, we envision applying similar clustering approaches to larger connectomes, aiming to identify classes and functionally related neuronal groups in more complex nervous systems. We suggest that contact profiles, along with neuron morphologies and synaptic partners, can act as ‘fingerprints’ for individual neurons and neuron classes. These ‘fingerprints’ can be aligned across animals of the same species to create identities for neurons. Frameworks for systematic connectomics analysis in tractable model systems such as C. elegans are critical in laying a foundation for future analyses in other organisms with up to a billion-fold increase in neurons (Toga et al., 2012).’

      Comment

      Lack of surface smoothing in NeuroSC leads to processes sometimes appearing to have gaps, which could be remedied by smoothing with a surface mesh. 

      We thank the reviewer for the suggestion, and understand the visibility of gaps in certain neuron processes can be distracting. But this was an intentional choice, with our main goal being to show the most accurate representation of the available data segmentation and avoid any rendering interpretations. In this way, we render the data with the highest fidelity we can and as close as possible to the ground truth of the EM segmentation. We have added language to describe this in the methods, lines 490-491, and in Figure legend 5b.

      Comment

      Toggling between time points while maintaining the same neurons and contact area in NeuroSC is a really valuable feature. The tool would be improved even more by extending this feature to synapses, specifically by allowing the user to add an entire group of synapses to the viewer at once (e.g. "all synapses between AIM and PVQ"), and to keep this synapse group invariant when toggling between developmental stages.

      We thank the reviewer for this suggestion. In response we have now implemented a new feature to ‘clone’ a rendered scene across time while preserving the original elements to ease comparisons. Once the user has rendered a scene, they can use the in-viewer developmental slider to clone the renderings and assigned colors, but display the renderings of the newly selected timepoint. These renderings populate a new window tab which can be dragged to align developmental stage windows side by side. We have added a sentence to account for this in lines 315-317 and to the legend of supplemental Figure S11. 

      Reviewer #2 (Public review)

      Comment

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      We thank that reviewer for this positive assessment of our work.

      Comment

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data is readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      We thank the reviewer for this feedback and in response have now implemented displays with quantitative information in NeuroSC. Now, upon hovering over a contact patch or synapse, the user will see the quantitative data of the relationship. For contact patches, you will see the total area shared between two neurons in that dataset. On hovering over a synapse, you will see how many synapses there are in total with the same members and throughout the dataset. We agree that this improves user analyses, (see also R1.3 response).

      Comment

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

      We appreciate that some of the more complex models can take a while to load. One of our core goals is to keep the high resolution of our models to most accurately represent the EM data, so we had to compromise between resolution and loading times. But to address this concern we have now added a ‘loading’ prompt that reassures the user when there is a wait. We also added, as suggested, text guidance throughout all of the supplemental videos (Supplemental Videos 1-4).

      Reviewer #3 (Public review)

      Comment

      A web-based app, NeuroSC, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development In the opinion of this reviewer, only minor revisions are required.

      We thank that reviewer for this positive assessment of our work.

      Comment

      Contact is defined by length, why not contact area? How are these normalized for changes in the overall dimensions of neurons during development?

      To clarify our methodology: the adjacency algorithm that we use generates a 2D adjacency profile by summing the number of adjacent boundary points per EM section, which are then summed across all EM z slices.

      Contact area can be derived by multiplying the adjacency length in each slice by pixel resolution and z-thickness. Prompted by the reviewer we have now also calculated and display contact surface areas, along with their ranks among all contact relationships for a given neuron. These can be inspected directly via the interface by clicking on a rendered cell or contact patch (Figure S5 and lines 308-312). We believe these additional surface area metrics enhance the interpretability and utility of the viewer.

      We apply normalization at the level of the adjacency threshold to account for dataset-specific differences such as contrast, boundary definition, and age-related changes in neuropil packing density. This normalization is applied before running the adjacency algorithm. We do not normalize by individual neuron size, as the contact data are intended to reflect relational differences between neurons, rather than absolute morphological scaling. In fact, our addition of a scale-spheroid within each rendered model emphasizes the large increase in spatial scale that the nerve ring experiences during larval growth.  

      Comment

      Figure 1, C&D, explanation unclear for how the adjacency matrix is correlated with C-Phate schematic in D.

      We thank the reviewer for the comment and have clarified this section by adding greater detail to the explanation of how an adjacency matrix is computed (lines 149-155), as well as a description now in the figure legend 1C. Additionally, we revised Figure 1C and D to simplify neuron representations/colors and to simplify the adjacency heat map gradient. We also extended the area of contact between neurons on Figure 1C to better reflect what would be considered a “contact”. Lastly, in the figure, we changed the color and placement for the z plane arrow and label from black to white, to make it more visible, to highlight the method of computing adjacency for each z slice. 

      Comment

      Figure 4, panels F & G, unclear why AVF is shown in panel G (L3) but not panel F (L1). Explanation (see below) should be provided earlier, i.e., AVF is not generated until the end of the L1.

      We have now clarified this important point by adding labels to Figure 4 panels F and G, ‘Pre-AVF outgrowth’ and ‘Post-AVF outgrowth’ respectively. Briefly, the point is that AVF grows into the nerve ring after the L2 stage, and that is why it is absent in panel F (L1 stage, now with the label ‘Pre-AVF outgrowth’).  

      Comment

      Line 146 What is the justification for the statement: "By end of Larval Stage 1 (L1), neuronal differentiation has concluded...."? This statement is confusing since this sentence also states that "90% of neurons in the neuropil...have entered the nerve ring..." which would suggest that at least 10% additional NR neurons have NOT fully differentiated.

      We have fixed this sentence in the text. Now the sentence reads ‘By Larval stage 1 (L1) 90% of the neurons in the neuropil (161 neurons out of the 181 neurons) have grown into the nerve ring and adopted characteristic morphologies and positions. 

      Lines 171-175 What is meant by the statement that "degree of these changes mapped onto...plasticity? What are examples of "behavioral plasticity?"

      We have added the following new lines of text (lines 200-204) and now additionally cite a review discussing C. elegans behaviors to clarify and give context to behavioral plasticity. ‘C. elegans exhibit tractable behaviors which can adapt due to changing environmental conditions  (Flavell et. al. Genetics 2020). Strata 3 and 4 contain most neurons belonging to circuits associated with such learned behaviors, including chemo, mechano and thermo sensation. This is seemingly reflected by strata 3 and 4 harboring the most readily recognized set of changes in neuronal relationships across postembryonic development.’  

      Comment

      Lines 189-190 The meaning of this sentence is unclear, "The logic in....merge events."

      This sentence has been deleted and we have instead refocused our descriptions of C-PHATES comparisons by neuronal clustering trajectories and cluster members (rather than iterations).

      Comment

      Lines 193-208 This section reports varying levels of convergence across larval development in C-Phate maps for the interneurons AIML and PVQL. Iterations leading to convergence varied: 16 (L1), 14 (L2), 22 (L3), 20 (l4), 14 (adult). The authors suggest that these differences are biologically significant and reflect the reorganization of AIML and PVQL contact relationships especially between the L4 and adult. Are these differences in iterations significant?

      We agree this could be confusing and instead of focusing on comparing the iteration at which each merging event occurs, we now focus on examining the differences in members of clusters, before and after the merge event. Cluster membership is easier to interpret than the differences in the number of DC iterations (lines 224-229).

      Lines 240-241 States that AVF neurons "terminally differentiate in the embryo" which is not correct. AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage which accounts for their outgrowth into the NR during the L2 stage. 

      We thank the reviewer for the correction and have edited the text to read: ‘AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage (Sulston et al. (1983); Sun and Hobert (2023); Poole et al. (2024); Hall and Altun (2008); Sulston and Horvitz (1977). AVF neurons do not grow into the nerve ring until the L2 stage, and continue to grow until the Adult stage (lines 261-266).’

      Comment

      Lines 289-315. A detailed and highly technical description of website architecture would seem more appropriate for the Methods section.

      We agree and have moved this section to the methods as suggested (lines 663-690).

      Comment

      Line 307 "source data is" should be "source data are"

      Thank you- we have fixed this grammatical error.

      Comment

      Line 324 "circuits identities" should be "circuit identity".

      Thank you- we have fixed this grammatical error.

      Comment

      Trademark/copyright conflict with these sites? https://compumedicsneuroscan.com/about/ https://www.neuroscanai.com/

      We thank the reviewer for drawing our attention to this. To avoid potential conflicts, we have proactively altered the name to NeuroSC throughout the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.\

      Reviewer #1(Public review):

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness. We have now added this point to the discussion: 

      “Second, we used blood volume as a proxy for local neuronal activity. Thus, our signal ignores any heterogeneity that might exist at the level of local neuronal populations. However, our main findings are related to the large-scale organization of cortical responses and how they relate to those of humans. For this purpose, the functional spatial resolution of our signal, driven by the spatial resolution of neurovascular coupling, should be adapted. In addition, using hemodynamic signals provides a much better comparison with human fMRI data, where the same limitations are present.”

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. It seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we observed the contrary, with non-primary regions overrepresenting non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings. 

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we do show that tuning to temporal rates differs across regions and partly explains the differences in background invariance we observe. In this regard, we think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans relies on different computational mechanisms. We have added a sentence to clarify this: “The model included a range of realistic temporal rates and this axis was the most informative to discriminate foregrounds from backgrounds.”

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature (McWalter and McDermott, 2018; McWalter and McDermott, 2019), including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. We have clarified and justified this choice in the beginning of the Results section:

      “We used three types of stimuli: foregrounds, backgrounds, and combinations of those. We use those terms to refer to sounds differing in their stationarity, under the assumption that stationary sounds carry less information than non-stationary sounds, and are thus typically ignored.”

      We have also added a paragraph in the discussion to emphasize the limits of this definition:

      “First, this study defined foregrounds and backgrounds solely based on their acoustic stationarity, rather than perceptual judgments. This choice allowed us to isolate the contribution of acoustic factors in a simplified setting. Within this controlled framework, we show that acoustic features of foreground and background sounds drive their separation in the brain and the hierarchical extraction of foreground sound features.”

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, we have emphasized this limitation in addition to the limitation of our definition of foregrounds and backgrounds in the discussion: 

      “In addition, most of the sounds included in our study likely have more relevance for humans compared to ferrets (see table \ref{tbl1}). Despite including ferret vocalizations and environmental sounds that are more ecologically relevant for ferrets, it is not clear whether ferrets would behaviorally categorize foregrounds and backgrounds as humans do. Examining how ferrets naturally orient or respond to foreground and background sounds under more ecologically valid conditions, potentially with free exploration or spontaneous listening paradigms, could help address this issue.”

      Reviewer #2(Public review);

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex and note that this point would also be valid for the published human fMRI data. Nevertheless, even if small differences in vasculature were present, it is unlikely that they would affect our analyses and results, which are designed to be independent of local vascular density. First, we normalize the signal in each voxel using the silent periods, so that the absolute strength of the raw signal, or baseline blood volume in each voxel, is factored in our analysis. Second, we only focus on reliably responsive voxels in each region and do see comparable sound-evoked responses in all regions (Figure S2). Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Differences in noise, measured through test-retest reliability, can affect values of correlation, which is why we used a noise-correction procedure. After this procedure, invariance does not depend on test-retest, and differences across regions are still seen when matching for test-retest (new  Figure S7). Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results. We added this point in the Methods section when discussing the noise-correction:

      “After this correction, the differences we observed between brain regions were present regardless of voxels' test-retest reliability, or noise level (Figure S7). Thus, potential differences in vasculature across regions are unlikely to affect our results.”

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, aiming at illustrating the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vectors (size: number of sounds) per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per voxel. We finally average these matrices across all voxels. The presence of red squares with high correlations demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We modified the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds. We clarified these points in the discussion:

      “In addition, fMRI has a worse spatial resolution than fUSI (here, 2 vs. 0.1 mm voxels). However, this difference in resolution compensates for the difference in brain size between humans and ferrets. In our previous work, we showed that a large fraction of cortical responses to natural sounds could be predicted from one species to the other using these methods (Landemard et al., 2021).”

      Reviewer #3 (Public review):

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure  S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., subselecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. In ferrets, we found a significantly better prediction accuracy in VP (p=0.001, permutation test) and no differences between MEG and dPEG (p=0.89). In humans, prediction accuracy was slightly higher in primary compared to non-primary auditory cortex, but this effect was not significant (p=0.076). In both species, when matching prediction accuracy between regions, the gradients in invariance were preserved. We have added these analyses to the manuscript (Figure S5).

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion. We have run an additional prediction using only the sounds presented in isolation, which replicates our main results (new Figure S6). We have added this control to the manuscript:

      “Results were similar if the model was fit solely on isolated sounds, excluding mixtures from the training set (Figure S6).”

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent crossspecies differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. While we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. Our timescales are much slower and likely reflect responses post-adaptation, which might not be as true for Hamersky et al. We have added this point to the discussion, as well as a comment on the difference between ferrets and humans in foreground invariance in primary auditory cortex:

      “In ferrets, primary auditory cortex has been found to over-represent backgrounds in mixtures compared to foregrounds (Hamersky et al., 2025). In contrast, we found a slight, non-significant bias towards foregrounds in primary regions. This difference could be driven by a difference in timescales, as we looked at slower timescales in which adaptation might be more present, reducing the strength of background encoding. In humans, we found a much smaller gap between background and foreground invariance in primary auditory cortex, which was not predicted by the spectrotemporal model. Additional, more closely controlled experiments would be needed to confirm and understand this species difference.”

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, explain the relationship between background/foreground and stationarity/non-stationarity, and thus why stationary/nonstationary stimuli could be used to probe differences in background/foreground processing.

      We have added a sentence at the beginning of the results section to justify our choice (see public review).  

      (2) Avoid use of the background/foreground terminology in Results (and probably Methods).

      For consistency with previous literature, we decided to keep this terminology, though imperfect. We further justified our choice in the beginning of the Results section (see previous point).

      (3) In the Discussion, explain what the implications of the results are for background/foreground processing, and, importantly, highlight any caveats that result from stationarity not being a direct measure of background/foreground.

      We added a paragraph in the Discussion to highlight this point choice (see public review).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: Showing a silent period in the examples would help in understanding the fUS signal.

      In Figure 1D, we show the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds. Thus, it would not be very informative to show an equivalent plot for a silent period, as it would look flat by definition. However, we updated the layout and legend of Figure 1 to make it clearer and avoid confusion.

      (2) "Responses were not homogenous" - would make more sense to say something like "responses were not spatially distributed".

      We removed these words which were indeed not necessary: “We found that reliable soundevoked responses were confined to the central part of ventral gyrus of the auditory cortex.”

      (3) Figure 2D: The maps shown in Figure 2D are difficult to understand for the noninitiated in fUS. At a minimum, labels should be added to indicate A-P, M-L, D-V. I cannot see the white square in the primary figure. An additional graphic would be helpful here to understand the geometry of the measurement.

      We thank the reviewer for pointing out that reading these images is indeed an acquired skill. We added an annotated image of anatomy with indications of main features to guide the reader in Figure 1. We also added missing white squares. 

      (4) Figure 2F: Can the authors better justify why the summary statistic is shown for all three areas, but the individual data only compares primary vs. higher order?`

      We now show individual data for all three areas.

      (5) More methods information is needed to understand how recordings were stitched across days. Was any statistical modeling used to factor out the influence of day on overall response levels?

      We simply concatenated voxels recorded across different sessions and days. The slices were sampled randomly to avoid any systematic effect. Because different slices were sampled in different sessions, any spatial structure spanning several slices is unlikely to be artefactual. For instance, the map of average responses in Figure 2A shows a high level of continuity of spatial patterns across slices. This indicates that this pattern reflects a true underlying organization rather than session-specific noise. It also shows that the overall response levels are not affected by the day or recording session. We added a section in the Methods (“Combining different recordings”) to clarify this point:

      “The whole dataset consisted of multiple slices, each recorded in a different recording session. Slices to image on a given day were chosen at random to avoid any systematic bias. Responses were consistent across neighboring slices recorded on different sessions, as shown by the maps of average responses (Figure 2A, Figure S2) where any spatial continuity across different slices must reflect a true underlying signal in the absence of common noise.”

      Reviewer #3 (Recommendations for the authors):

      (1) Figures:

      The figures are generally very well done and visually appealing. However, I have a few suggestions and questions.

      a)  In Figure 1G, the delta CBV ranges from 0.5 to 1.5, although in subsequent figures (e.g., Figure 2D), the range is much larger (-15 to 45). Is it possible that the first figure is a proportion rather than a percentage, or is there some other explanation for the massive difference in scale? Not being very familiar with this measure, it was confusing.

      The same scale is used in both figures, the major difference being that in Figure 1D, we take the average over all voxels and sounds (for each category), which will include many nonresponsive voxels, and for responsive voxels, sounds that they do not respond a lot to. On the other hand, Figure 2D shows the response of a single, responsive voxel. Thus, the values it reaches for its preferred sounds (45%) are an extreme, which weighs only little in Figure 1D. We have changed the legend of Figure 1D to make this more explicit.

      b)  Similar to the first point, the strength of the correlations in the matrices of Figure 1E is very small (~ 0.05) compared to the test-retest reliabilities plotted in Figure 2B (~0.5). Again, I was confused by this large difference in scale.

      Two main factors explain the difference in values between Figure 1E and Figure 2B. First, in Figure 1B, each correlation is done on the average activity in a window of 0.3 s, opposed to 2.4 s in Figure 2B. More averaging leads to better SNR, which inevitably leads to higher testretest correlations. Second, in Figure 1B, the cross-correlation matrices are averaged across all responsive voxels without any criterion for reliability. On the other hand, Figure 2B show example voxels with good test-retest reliability. 

      c)  In Figure 2D, the example voxels are supposed to be shown in white. It appears that this example voxel is only shown for the non-primary voxel. Please be sure to add these voxels throughout the other panels and figures as well. 

      We fixed this mistake and added the example voxel in all panels.

      d)  Why do the invariance results (e.g., Figure 2F) for individual animals combine across dPEG and VP, while the overall results (across all animals) split things across all three regions? The results in Table 2 do, in fact, provide this data. Upon further examination of the data in Table 2, it seems like there is only a significant difference between background invariance between dPEG and VP for one of the two animals, and that this might be what drives the effect when pooling across all animals. This seems important to both show visually in the figure and to potentially discuss. There is still very clearly a difference between primary and non-primary, but whether there is a real difference between dPEG and VP seems more unclear.

      We added the values for single animals in the plot and highlighted this limitation in the text:

      “While background invariance was overall highest in VP, the differences within non-primary areas were more variable across animals (see table 2).”

      e)  Again, as in Figure 2F, the cross symbols seem like a bad choice as markers since the vertical components of the cross are suggestive of the error of the measurement. However, no error is actually plotted in these figures. I recommend using a different marker and including some measure of error in the invariance plots.

      We replaced the crosses with circles to avoid confusion. The measure of error is provided by the representation of values for single animals.

      f) The caption for Figure 4C states that each line corresponds to one animal, but does not precisely state what this line represents. Is this the median or something?

      Each line indeed represents the median across voxels for one animal. We added this information to the legend.

      g)  In Figure 5, the captions for panels D and E are swapped.

      This has now been corrected.

      (2) Discussion:

      (a) In the paragraph on methodological differences, it mentions that the fMRI voxel size is around 2 mm. This may be true in general, but given the comparison to Kell & McDermott 2019, the voxel size should reflect that used in their study (1 mm).

      The reviewer might refer to this sentence from the methods of Kell et al., 2019: “T1weighted anatomical images were collected in each participant (1-mm isotropic voxels) for alignment and cortical surface reconstruction.” However, this does not correspond to the resolution of the functional data, which is 2 mm, as mentioned a bit further in the Methods:  “In-plane resolution was 2 × 2 mm (96 × 96 matrix), and slice thickness was 2.8 mm with a 10% gap, yielding an effective voxel size of 2 × 2 × 3.08 mm.”

      (b) In the next paragraph on the control of attention, it mentions that attentional differences could play a role. However, in Kell & McDermott 2019, they manipulated attention (attend visual versus attend auditory) and found that it did not substantially affect the observed pattern invariance. I suppose it could potentially affect the degree to which an encoding model could explain the invariance. This seems important, and given that the data was already collected, it could be worth it to analyze that data.

      As the reviewer points out, Kell et al. 2019 ran an additional experiment in which they manipulated auditory vs. visual attention. However, the auditory task was just based on loudness and ensured that the participants were awake and paying attention to the stimuli, but not specifically to the foreground or background. This type of attention did not lead to changes in the observed patterns of invariance, which might have been the case for selective attention to backgrounds or foregrounds in the mixture. Given that these manipulations were not done in the ferret experiments, we chose to not include the analysis of this dataset in the scope of this paper. However, future work investigating that topic further would indeed be of interest.

      (c) The mention of "a convolutional neural network trained to recognize digits in noise" should make more obvious that this is visual recognition rather than auditory recognition.

      We clarified this sentence to make clear that the recognition is visual and not auditory: “For instance, in a convolutional neural network trained to visually recognize digits in different types of noise, when local feedback is implemented, early layers encode noise properties, while later layers represent clean signal.”

      (d) Finally, one explanation of the results in the discussion is that "primary auditory areas could be recruited to maintain background representations, enabling downstream cortical regions to use these representations to specifically suppress background information and enhance foreground representations." This "background-related information" being used to "facilitate further extraction of foregrounds" is similar to what is argued in Hicks & McDermott PNAS 2024.

      We thank the reviewer for suggesting this relevant reference and added it in this paragraph of the discussion.

      (3) Methods:

      In the "Cross-correlation matrices" section, it mentions that time-averaged responses from 2.4 to 4.8 s were used. It would be helpful to provide an explanation of why this particular time window was used. Additionally, I wondered whether one could look at adaptation type effects (e.g., that of Khalighinejad et al., 2019) or whether fUSI does not offer this kind of temporal precision?

      The effects shown in Khalighinejad et al., 2019, are indeed likely too fast to be observed with our methods. However, there are still dynamics in the fUSI signal and in its invariance (Figure S1). Each individual combination of foreground and background is presented for 4.8 s (Figure 1B). Therefore, we chose the range 2.4-4.8 s as the biggest window we could use (to improve SNR) while minimizing contamination from the previous or next sound (indeed, blood volume typically lags neuronal activity by 1.5-2 s). We added this precision to the methods.

      In the "Human analyses" section, it is very unclear which set of data was used from Kell & McDermott 2019. For example, that paper contains 4 different experiments, none of which has 7 subjects. Upon closer reading, it seems that only 7 of the 11 participants from Experiment 1 also heard the background sounds in isolation (thus enabling the foreground invariance analyses). However, they stated that there were only 3 female participants in that experiment, while you state that you used data from 7 females. It would be helpful to double-check this and to more clearly state exactly which participants (i.e., from which experiment) were used and why (e.g., why not use data from Experiment 4 in the visual task/attention condition?).

      We added a sentence to clarify which datasets were used: “Specifically, we used data from Experiment 1 which provided the closest match to our experimental conditions, and only considered the last 7 subjects that heard both the foregrounds and the backgrounds in isolation, in addition to the mixtures.” 

      It was a mistake to mention that it was all female, as the original dataset has 3 females and 8 males, of which we used 7 without any indication of their sex. Thus, we removed this mention from the text.

      In the "Statistical testing" section, why were some tests done with 1000 permutations/shuffles while others were done with 2000?

      We homogenized and used 1000 permutations/shuffles for all statistical tests.

      (4) Miscellany:

      (a) The Hamersky et al. 2023 preprint has recently been published (referenced in the public review), and so you could consider updating the reference.

      This reference has now been updated.

      (b) There are a few borderline statistical tests that could use a bit more nuance. For example (on page 4), "In primary auditory cortex (MEG), there was no significant difference between values of foreground invariance and background invariance (p = 0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times)." This test is quite close to being significant, and this might be acknowledged.

      We emphasized the trend to nuance the interpretation of these results: “In primary auditory cortex (MEG), foreground invariance was slightly lower than background invariance, although this difference was not significant (p=0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times).”

      (5) Potential typos:

      (a)   Should the title be "natural sound mixtures" instead of "natural sounds mixtures"?

      (b) The caption for Figure 1 says "We imaged the whole auditory through successive slices across several days." I believe this should the "the whole auditory [cortex]." c) In the first paragraph of the discussion, there is a sentence ending in "...are segregated in hemody-namic signal." I believe this should be "hemody-namic signal."

      These errors are now all corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Thank you.

      Weaknesses:

      The rationale of some of the experiments is not well defined.

      Thank you. In the revised manuscript, we have clarified and expanded the rationale behind each experiment to better highlight the specific questions being addressed and how each approach contributes to our overall investigation. These clarifications have been integrated throughout the Results and Discussion sections. We provide detailed rationale in our point-by-point responses to both reviewers, outlining how each experimental design was motivated by prior findings, hypotheses, or specific gaps in knowledge. We hope these revisions make the experimental logic and progression better defined and more compelling.

      Major Comments:

      (1) Figure 5

      I do not understand the rationale for performing experiments using CatSper-null sperm and CD9-null oocytes. It is well established that CatSper-null sperm are unable to penetrate the zona pellucida (ZP), so the relevance of this approach is unclear.

      We thank the reviewer for this comment. This experiment was conducted as the basis to then evaluate the contributions of progressive and hyperactivated motility to the ability of mouse sperm to locate and traverse the zebrafish micropyle. In earlier experiments (Figures 1 and 3), we assessed whether sperm-micropyle interaction was robust by comparing it to binding to the mouse zona pellucida and testing whether both interactions persisted after washing, which is standard approach to distinguish specific binding from non-specific adherence (Avella et al., 2014; Baibakov et al., 2012). Thus, we extended this analysis to CatSper1<sup>Null</sup> sperm; CatSper1<sup>Null</sup> sperm were still capable of binding the zona pellucida comparably to heterozygous controls, though they were unable to cross the zona of Cd9<sup>Null</sup> eggs. These observations served as a validation step for the use of CatSper1<sup>Null</sup> sperm for downstream micropyle interaction assays. Thus, we proceeded to test whether hyperactivated motility, absent in CatSper1<sup>Null</sup> sperm, is required for locating and crossing the micropyle.

      It is indeed well established that CatSper1<sup>Null</sup> sperm are unable to penetrate the zona pellucida, and previous studies have typically used the absence of fertilized eggs as a readout. However, failed fertilization may result from multiple factors, including impaired sperm motility, reduced capacity to bind the zona pellucida, or an inability to penetrate it. To our knowledge, no study has quantitatively assessed the number of CatSper-deficient sperm that successfully bind, cross the zona and reach the perivitelline space. To address this, we first used normal oocytes for sperm binding and Cd9<sup>Null</sup> oocytes (Le Naour et al., 2000), which allow direct quantification of sperm accumulation in the perivitelline space. We have 7included a detailed explanation in the Results to clarify this point, lines 352-365 and 376-369.

      (2) Micropyle penetration and sperm motility

      CatSper-null sperm are reportedly unable to cross the micropyle, but this could be due to their reduced motility rather than a lack of hyperactivation per se. Were these experiments conducted using capacitated or non-capacitated spermatozoa? What was the observed motility of CatSper-null sperm during these assays? Clarifying these conditions is essential to avoid drawing incorrect conclusions from the results.

      Thank you for raising these points. Under our IVF conditions, qualitative observations confirmed that CatSper1<sup>Null</sup> sperm displayed progressive motility, maintained sufficient progressive motility during the first hour post-insemination and exhibited zona binding efficiency comparable to that of CatSper1<sup>Het</sup> controls (Figure 5A and B). This is consistent with previous reports showing that within the first 90 minutes of sperm incubation in media, approximately 20% of CatSper1<sup>Null</sup> sperm preserve motility (Qi et al., 2007). Given previous studies indicating that 15–35% of sperm undergo hyperactivation within 90 minutes (Goodson et al., 2011), and considering that 100,000 progressively motile sperm were used for insemination, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the cross-species insemination dish (mouse sperm x zebrafish eggs). Based on these numbers, we would have expected at least some sperm to locate the micropyle if hyperactivation were not required for its detection and entry. Nevertheless, CatSper1<sup>Null</sup> sperm were detected in proximity to the micropyle canal, its opening, or within the inter-chorion space (ICS). These observations support the conclusion that the inability ofCatSper1<sup>Null</sup> sperm to locate and enter the micropyle is attributable to their failure to hyperactivate. Also, all sperm used in these assays were exposed to identical capacitating conditions (HTF/HSA, 37 °C, 5% CO2). We now clarify this in the Methods, line 624, and we added more rationale under the Results, lines 361-365 and in the Discussion, lines 470-483.

      (3) Rheotaxis and micropyle navigation

      Previous studies have shown that CatSper-null sperm fail to undergo rheotaxis. Could this defect be related to their inability to locate and penetrate the micropyle? Exploring a potential shared mechanism could be informative.

      Thank you for raising this interesting point. Indeed, homozygous mutant mice lacking expression of a different component of the CatSper channel, CatSperz, show reduced rheotactic efficiency and severe subfertility (Chung et al., 2017). We cannot exclude that complete lack of CatSper as shown in CatSper1<sup>Null</sup> mice could lead to reduced rheotactic efficiency, hence we include this interpretation in the Discussion (lines 484-486).

      (4) Lines 61-74

      This paragraph omits important information regarding acrosomal exocytosis, which occurs prior to sperm-egg fusion. Including this detail would strengthen the discussion.

      Thank you. We have revised the text in the discussion to describe the process of acrosome exocytosis, and its relevance for fertilization (lines 504-518).

      Reviewer #2 (Public review):

      Summary:

      Garibova et al. investigated the conservation of sperm recognition and interaction with the egg envelope in two groups of distantly related animals: mammals (mouse) and fish (zebrafish). Previous work and key physiological differences between these two animal groups strongly suggest that mouse sperm would be incapable of interaction with the zebrafish egg envelope (chorion) and its constituent proteins, though homologous to the mammalian zona pellucida (ZP). Indeed, the authors showed that mouse sperm do not bind recombinant zebrafish ZP proteins nor the intact chorion. Surprisingly, however, mouse sperm are able to locate and bind to the zebrafish micropyle, a specialized canal within the chorion that serves as the egg's entry point for sperm. This study suggests that sperm attraction to the egg might be highly conserved from fish to mammals and depends on the presence of a still unknown glycosylated protein within the micropyle. The authors further demonstrate that mouse sperm are able to enter the micropyle and accumulate within the intrachorionic space, potentially through a CatSper-dependent mechanism.

      Strengths:

      The authors convincingly demonstrate that mouse sperm do not bind zebrafish ZP proteins or the chorion. Furthermore, they make the interesting observation that mouse sperm are able to locate and enter the zebrafish micropyle in an MP-dependent manner, which is quite unexpected given the large evolutionary distance between these species, the many physiological differences between mouse and zebrafish gametes, and the largely different modes of both fertilization and reproduction in these species. This may indicate that the sperm chemoattractant in the egg is conserved between mammals and fish; however, whether zebrafish sperm are attracted to mouse eggs was not tested.

      Thank you. We performed an additional experiment with fish sperm used to inseminate ovulated mouse eggs, and results are reported in lines 183-187 and in Supplementary Figure 2.

      Weaknesses:

      The key weakness of this study lies in the rationale behind the overall investigation. In mammals, the zona pellucida (ZP) has been implicated in binding sperm in a taxon-specific manner, such that human sperm are incapable of binding the mouse ZP. Indeed, work by the corresponding author showed that this specificity is mediated by the N-terminal region of the ZP protein ZP2 (Avella et al., 2014). The N-termini of human and mouse ZP2 share 48% identity, which is higher than the overall identity between mouse and zebrafish ZP2, with the latter ortholog entirely lacking the N-terminal domain that is essential for sperm binding to the ZP. Given this known specificity for mouse vs. human sperm-ZP binding, it does not follow that mouse sperm would bind ZP proteins from not only a species that is much more distantly related, but also one that is not even a mammal, the zebrafish. Furthermore, the fish chorion does not play a role in sperm binding at all, while the mammalian ZP can bind sperm at any location. On the contrary, the zebrafish chorion prevents polyspermy by limiting sperm entry to the single micropyle.

      We thank the reviewer for this detailed comment. In this study, our goal was precisely that one of validating the hypothesis that mouse sperm would not bind either recombinant fish ZP proteins or the chorion; in addition, we found it important to examine the observation that mouse sperm could detect the micropyle. We further elaborated this rationale in the Introduction (lines 93-100).

      In addition, though able to provide some information regarding the broad conservation of sperm-egg interaction mechanisms, the biological relevance of these findings is difficult to describe. Fish and mammals are not only two very distinct and distantly related animal groups but also employ opposite modes of fertilization and reproduction (external vs. internal, oviparous vs viviparous). Fish gametes interact in a very different environment compared to mammals and lack many typically mammalian features of fertilization (e.g., sperm capacitation, presence of an acrosome, interaction with the female reproductive tract), making it difficult to make any physiologically relevant claims from this study. While this study may indicate conserved mechanisms of sperm attraction to the egg, the identity of the molecular players involved is not investigated. With this knowledge, the reader is forced to question the motivation behind much of the study.

      We thank the reviewer for their perspective, and we appreciate the opportunity to further elaborate on our rationale. As outlined in our Results and Discussion sections, a growing body of evidence supports the presence of conserved molecular players and signaling pathways involved in gamete interaction across species with diverse reproductive strategies. While zebrafish and mice do differ in their fertilization environments and modes of reproduction, these differences may not necessarily exclude the possibility of conserved molecular mechanisms underlying gamete interaction. For example, the CatSper calcium channel, which plays a key role in regulating sperm motility and hyperactivation, is conserved across a broad range of taxa—from echinoderms such as sea urchins (external fertilizers)(Seifert et al., 2015) to mammals, including mice and humans (internal fertilizers)(Lishko and Mannowetz, 2018). Moreover, sperm from some fish species possess acrosomes that undergo exocytosis prior to fertilization while sperm cross the micropyle (Psenicka et al., 2010). Also, in ovoviviparous species with internal fertilization, such as the black rockfish, sperm do undergo molecular changes while in the female reproductive tract—including immunomodulatory adaptations, glycocalyx remodeling, and interactions with ovarian cells—enabling the sperm with a longer-term survival and a selective persistence that ensures only the fittest sperm can successfully fertilize eggs (Li et al., 2024). As per the mammalian capacitation, it is broadly defined as the process during which sperm undergo hyperactivation (Yanagimachi, 1970), and acquire the ability to undergo the acrosome exocytosis, making the sperm competent for gamete fusion and fertilization (Bhakta et al., 2019; Puga Molina et al., 2018; Yanagimachi, 1957; Yanagimachi et al., 2017). Of note, acrosome exocytosis or changes in sperm motility are not exclusive to internal fertilizers. For example, as we cite in our manuscript (and as just stated above), acrosome exocytosis has been described to occur as sturgeon sperm cross the micropyle (Psenicka et al., 2010). As per changes in flagellar motility, investigations in the Pacific herring (Clupea sp.) demonstrated that sperm remain nearly immotile upon release into seawater and only initiate motility when approaching the micropyle region of the egg (Yanagimachi, 1957; Yanagimachi et al., 2017). In other fish, including bitterling and zebrafish, further enhancement in sperm motility is observed as sperm approach the micropyle area (Suzuki, 1958; Yanagimachi et al., 2017). These studies suggest that functional equivalents of capacitation may exist across taxa.

      We interpret the observation that mouse sperm can locate and enter the micropyle as suggesting that underlying guidance mechanisms may be more broadly conserved across distant species than previously recognized. We have now elaborated on these points in the revised Discussion (lines 531-552), and we hope the motivation behind our study is now more clearly articulated.

      During fertilization in fish, the sperm enters the micropyle and subsequently, the egg, as it is simultaneously activated by exposure to water. During egg activation, the chorion lifts as it separates from the egg and fills with water. This mechanism prevents supernumerary sperm from entering the egg after the successfully fertilizing sperm has bound and fused. In this study, the authors show that mouse sperm enter the micropyle and accumulate in the intrachorionic space. Whether any sperm successfully entered the egg is not addressed, and the status of egg activation is not reported.

      We appreciate the reviewer’s detailed comments and the opportunity to elaborate on this important aspect for our cross-insemination assay. We interpret the reviewer’s reference to “sperm entering the egg” as pertaining to sperm adhesion to the oocyte plasma membrane followed by fusion with the egg cell, two separate steps regulated by different molecular players for sperm-egg plasma membrane adhesion (Bianchi et al., 2014; Fujihara et al., 2021; Herberg et al., 2018; Inoue et al., 2005) and for fusion. It is important to note that proteins mediating gamete fusion are still unidentified in fish and mammals (Bianchi and Wright, 2020; Deneke and Pauli, 2021).

      In our cross-species insemination experiments, zebrafish oocytes were maintained in Hank’s solution to limit spontaneous activation; however, as the reviewer correctly notes, activation likely occurred upon exposure to HTF. While this model does not recapitulate full fertilization events, it serves as a platform to explore whether mammalian sperm can detect (within the scope of our study) and respond (future studies) to putative evolutionarily conserved signals, such as those guiding fish sperm toward the micropyle.

      While investigating cross-species sperm–oocyte fusion was not within the scope of this study and would require a distinct set of experimental approaches, we believe this question is an important one. However, we do not expect our platform to be informative for evaluating sperm adhesion to the fish oolemma or for enabling cross-species gamete fusion. In our assays focused on sperm-micropyle interaction, Hoechst staining of nuclei of transgenically-tagged acrosome sperm revealed no evidence of sperm adhesion to or fusion with the fish egg membrane (Figure 4D). Also, molecular incompatibilities may further prevent this interaction: in zebrafish, the Ly6/uPAR family protein Bouncer is expressed exclusively in the egg and is necessary for sperm–egg membrane adhesion (Herberg et al., 2018). Recent studies in zebrafish and mice have shown that a conserved trimeric complex composed of Izumo1, Spaca6, and Tmem81 on the sperm surface is required for mediating adhesion to the oocyte membrane by interacting with the mammalian oocyte receptor Izumo1R (also known as JUNO) or the zebrafish oocyte receptor Bouncer (Deneke et al., 2024). One would hypothesize that for mouse sperm to adhere to the zebrafish egg membrane, the mouse Izumo1-Spaca6-Tmem81 complex would need to establish binding with Bouncer. To explore this possibility, we performed AlphaFold2-Multimer structural predictions and docking analyses to mimic an interaction between mouse Izumo1-Spaca6-Tmem81 and zebrafish Bouncer, using mouse Izumo1-Spaca6-Tmem81 and Juno or zebrafish Izumo1-Spaca6-Tmem81 and Bouncer as positive controls. We observed low binding affinity between zebrafish Bouncer and the mouse trimeric complex (Izumo1, Spaca6, and Tmem81), as indicated by low ipTM scores and high predicted aligned error (PAE) values. These findings suggest that the mouse complex is unlikely to form an interaction with Bouncer (now shown in Suppl. Figure 7). These predictions were consistent with our observations that no sperm were found adhering or fusing to the egg cell. We describe methods and results in the supplementary files (Supporting Info, lines 53-66) and in the result sections (lines 335-339).

      In Supplementary Videos 3-4, the egg shown has been activated for some time, as evident by the separation of yolk and cytoplasm, yet the chorion is only partially expanded (likely due to mouse IVF conditions). How multiple sperm were able to enter the micropyle but presumably not the egg is not addressed, yet this suggests that the zebrafish mechanism of blocking polyspermy (fertilization by multiple sperm) is not effective for mouse sperm or is rendered ineffective due to mouse IVF conditions. The authors do not discuss these observations in the context of either species' physiological process of fertilization, highlighting the lack of biological context in interpreting the results.

      Thank you for raising this important point. One model for mammalian gamete recognition at the zona supports the notion that mouse sperm can penetrate extracellular matrices as long as sperm can bind to them, and binding is dependent on the cleavage status of ZP2. Zonae surrounding unfertilized mouse eggs present uncleaved ZP2 and these zonae support sperm binding. After gamete fusion, the cortical granules release ovastacin which cleaves ZP2 at the N-terminus, and consequently, zonae presenting cleaved ZP2 no longer support sperm binding. This mechanism acts as block to zona binding and prevents further crossing (Bhakta et al., 2019). Indeed, fertilized mouse eggs or 2-cell embryos surrounded by a zona containing uncleaved ZP2 support de novo sperm binding, and supernumerary sperm cross the zona and accumulate in the perivitelline space, unable to fuse with the fertilized oocyte plasma membrane or blastomere cells (Baibakov et al., 2012, 2007; Burkart et al., 2012; Gahlay et al., 2010). Thus, because under our experimental conditions, mouse sperm could interact with the micropyle opening, we interpret these findings to suggest that once interaction occurs at the micropyle opening, mouse sperm are capable of crossing it, even under conditions where the micropyle may be detached from the oocyte due to oocyte activation. Therefore, our data indicates that mouse sperm may be able to bypass the mechanism of zebrafish oocytes blocking multiple sperm to pass through the micropyle, even after oocyte activation. This point has now been incorporated into the revised Discussion (lines 425-441).

      The authors further show that the zebrafish micropyle does not trigger the acrosome reaction in mouse sperm. Whether the acrosome reacts is not correlated with a sperm's ability to cross the micropyle opening, as both acrosome-intact and acrosome-reacted sperm were observed within the intrachorionic space. While the acrosome reaction is a key event during mammalian fertilization and is required for sperm to fertilize the egg, zebrafish sperm do not contain an acrosome. Thus, these results are particularly difficult to interpret biologically, bringing into question whether this observation has biological relevance or is a byproduct of egg activation/chorion lifting that indirectly draws sperm into the chorion.

      We thank the reviewer for raising this point and we appreciate the opportunity to elaborate on the biological relevance of this experiment. Our motivation to assess acrosome status in mouse sperm following entry into the zebrafish micropyle stemmed from the following biological considerations.  In fish species such as the sturgeon, sperm present an acrosome and undergo acrosome exocytosis while passing through the micropyle, before gamete fusion (Alavi et al., 2012; Psenicka et al., 2010). By contrast, zebrafish sperm lack an acrosome, raising the hypothesis that the zebrafish micropyle may not be able to trigger acrosome exocytosis. However, this possibility has not been experimentally tested. We therefore considered it important to investigate whether passage through the zebrafish micropyle induces acrosome exocytosis in mouse sperm. We have revised the Discussion to better clarify the rationale behind the experiment as well as the interpretation of the findings (lines 504-518). As per the chorion lifting indirectly drawing sperm into the chorion, we have not observed this phenomenon.

      The final experiments regarding CatSper1's role in mediating mouse sperm entry into the micropyle/chorion are not convincing. As no molecular interactions are described or perturbed, the reader cannot be sure whether the sperm's failure to enter is due to signaling via CatSper1 or whether the overall failure to undergo hyperactivation limits sperm motility such that the mutant sperm can no longer find and enter the zebrafish micropyle. Indeed, in Figure 5E, no CatSper1 mutant sperm are visible near any part of the egg, suggesting that overall motility is impaired, and this is not a phenotype specific to interactions with the micropyle.

      We appreciate the comment and the opportunity to further elaborate on the rationale of this experiment. While our data demonstrates a lack ofCatSper1<sup>Null</sup> sperm accumulation within the micropyle and ICS, we appreciate that this may be interpreted as the result of general motility defects, rather than a specific failure in undergoing hyperactivation and micropyle recognition. CatSper1<sup>Null</sup>  sperm are known to lack hyperactivated motility and exhibit a progressive loss of forward motility over time. After 90 minutes, only ~20% of CatSper1<sup>Null</sup>l sperm remain motile, compared to over 70% in fertile sperm (Qi et al., 2007). Of note, under our IVF conditions, CatSper1<sup>Null</sup> sperm retained sufficient progressive motility during the first hour post-insemination to bind the zona pellucida with comparable efficiency to CatSper1<sup>Het</sup> controls. Based on prior reports indicating that 15–35% of sperm exhibit hyperactivation by 90 minutes (Goodson et al., 2011), and considering that we inseminated with 100,000 progressively motile sperm, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the dish. Yet, none were observed near the micropyle canal, its opening, or within the ICS. This led us to conclude that failure to hyperactivate underlies the inability of CatSper1<sup>Null</sup> sperm to reach and traverse the micropyle. Also, we appreciate that identifying the molecular components of the micropyle would allow direct testing of whether the CatSper channel is activated in response to micropyle-associated signals. Indeed, no targeted perturbation of molecular interaction regulating micropyle recognition was performed in this study, as the molecular identity of the zebrafish micropyle guidance cue remains unknown. Efforts to identify and characterize this factor are ongoing in our lab and lie outside the scope of the current work. Therefore, throughout the manuscript, we have clarified that it is the failure to undergo hyperactivation, rather than the absence of CatSper per se, that limits the ability of sperm to locate and traverse the micropyle. The rationale for the experiment, the interpretation of our findings, and relevant future directions have been further elaborated in the revised Abstract, Impact Statement and Discussion (lines 40-41; 46-47; 343-365; 376-379; 389-399; 470-486).

      Reviewer #1 (Recommendations for the authors):

      Minor Comments

      (1) Figure numbering

      There appear to be inconsistencies in the figure references. For example, what is referred to as Figure 3F in the text is actually Figure 4F. Please review and correct all figure labels for accuracy.

      We thank the reviewer for pointing this out. We have carefully reviewed the manuscript and corrected all figure references throughout the text. Also, for better flow and coherence, we have moved the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes." Previously, the callout of panels in Figure 3 was out of order (3A, 3B, 3E, 3C, 3D), and this reorganization also helps maintain logical progression through the figure panels.

      (2) Figure 5 terminology:

      The term "normal" sperm should be replaced with "CatSper heterozygous (Het)" sperm to avoid confusion and improve precision.

      We thank the reviewer for this helpful suggestion. We have revised the terminology in Figure 5 and throughout the manuscript, replacing “normal” sperm with “CatSper1 heterozygous (Het)”

      Reviewer #2 (Recommendations for the authors):

      In addition to my comments in the public review, I would encourage the authors to consider the following suggestions:

      The authors show that mouse sperm can find and enter the fish micropyle, and that this depends on the presence of MP. To better assess sperm binding to the micropyle region, the number of sperm binding to the micropyle vs. non-micropyle chorion should be clearly quantified, as well as the percentage of sperm that enter the micropyle compared to the total used for insemination. The authors state several times throughout the text that a "subpopulation" of mouse sperm finds and enters the micropyle, but it would be more precise and informative to give a percentage.

      We thank the reviewer for this suggestion. We have now reported also the number of sperm bound to the other regions of the chorion (away; lines 231-233), as well as the percentage of sperm that entered the micropyle relative to the total number used for insemination (lines 276-279).

      To ensure that all sperm are inside the chorion, the egg should be removed from the insemination dish, washed thoroughly, and then the chorion should be torn open to definitively show that the sperm were indeed inside.

      We thank the reviewer for these excellent suggestions. As per ensuring that the sperm are inside the ICS, (as shown now in Figures 4A, F, G , Supplementary Figure 6 and Supplementary Movies 3–5), the inseminated oocytes were thoroughly washed prior to imaging to ensure that only sperm located inside the chorion were visualized (as described in the Methods, lines 646-648). In addition, to confirm the spatial localization of sperm within the ICS, we are now including additional TEM images showing sperm in the ICS (Figure 4G, right panel). Also, we generated orthogonal views using ZEN Lite software (Zeiss, Germany) from a z-stack encompassing the full volume of the chorion, ICS, and oocyte (added in the supplementary materials, as Supplementary Figure 6). These views display three focal planes: the surface of the WGA-stained chorion, the middle of the ICS, and the oocyte plasma membrane. Sperm nuclei stained with Hoechst are clearly visible below the chorion surface and above the oocyte plasma membrane, confirming their localization within the ICS. Additionally, in a separate set of experiments, as recommended by this reviewer, we mechanically disrupted the chorion and consistently detected sperm within the ICS. This procedure, however, was technically challenging: upon disruption, the chorion often collapsed onto the oocyte, and during the extraction process, sperm were sometimes displaced. As a result, it was not always possible to determine with complete confidence whether the sperm had originally been located inside or outside the chorion. However, we hope that the additional TEM and confocal images (Figure 4G and Supplementary Figure 6) offer further support for the localization of sperm within the ICS.

      I would further suggest that they examine the micropyle opening after the entry of multiple sperm, as well as the dynamics of egg activation during insemination with mouse sperm.

      Thank you. We now include one additional TEM image capturing the full structure of a micropyle that was traversed by multiple mouse sperm (shown in Figure 4G, left panel).

      At what point does the micropyle detach from the egg surface? Live imaging of this process with a confocal microscope would be very informative.

      During live imaging, the interval between placing the oocyte in the imaging dish, replacement of Hank’s solution with HTF and the addition of sperm, followed by the initiation of video acquisition, is approximately 2 to 3 min. By this time, the ICS is already apparent (Supplementary Video 2), although the micropyle appears to remain adherent to the egg cell. Partial detachment of the micropyle from the egg cell begins around 6–7 minutes after imaging starts and continues progressively over time. We provide time-lapse imaging frames to show the micropyle detachment under mouse IVF conditions (Supplementary Figure 5).

      Along the same lines, sperm should be doubly labeled with an acrosome-independent marker, i.e., a live DNA stain or MitoTracker. Then the authors could track if any sperm are actually able to enter the egg itself, which would be highly unlikely but an important detail to confirm.

      Thank you for pointing this out. In our assays designed to study sperm–micropyle interactions, Hoechst staining of nuclei in transgenically labeled acrosome sperm showed no indication of sperm adhesion to, or fusion with, the zebrafish egg cell (Figure 4D).

      Line 242, 282: The text should refer to Figure 4, not 3. Please make sure all figure references correspond to the correct figure and panel.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and corrected the reference to Figure 4, along with all other figure and panel citations to ensure they accurately correspond to the correct content. Also, to improve the overall flow, we relocated the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes". This change also helped correct the sequence of figure panel references, which were previously cited out of order (i.e., 3A, 3B, 3E, 3C, 3D).

      Line 244: The authors quantify sperm that are "away" from the micropyle, but this is not clearly defined. This should be given as a set radius or distance from the center (e.g., in microns). If the sperm are still motile, can this be accurately measured?

      We thank the reviewer for this valuable suggestion. We have now defined “away from the micropyle” as a distance greater than 160 µm from the center of the micropyle. This measurement was determined using confocal z-stack projections of fixed samples. These details have been added to the revised Methods section (lines 670-674).

      To strengthen the conclusion that the sperm chemoattractant is indeed conserved from fish to mammals, the authors could show that zebrafish sperm are also able to find/approach mouse eggs. Even more compelling would be to show the same is true for other species combinations. As it stands, the choice of comparing mouse and zebrafish does not seem scientifically motivated but rather due to their availability.

      We thank the reviewer for this important suggestion. To test whether zebrafish sperm are capable of binding to the mammalian zona pellucida, we conducted the suggested experiment: ovulated, cumulus-free mouse oocytes were placed in water and incubated with zebrafish sperm. We did not observe any zebrafish sperm bound to the mouse zona pellucida, consistent with the hypothesis that zebrafish sperm do not recognize or interact with mammalian zonae or ZP proteins. This has now been added in the Results (lines 183-187) and shown in Supplementary Figure 2. We interpret these findings as in cross-species insemination assays, reciprocity in sperm-egg interaction is not always observed. For example, while human sperm bind only to human zonae and not to mouse zonae, mouse sperm are able to bind both mouse and human zonae (Avella et al., 2014; Baibakov et al., 2012; Bedford, 1977). This asymmetry may reflect species-specific adaptations in sperm-egg recognition. We have now added this point to the revised Discussion to clarify the rationale and context of our approach (lines 416-423).

      As per the choice of experimental models, while we agree that testing additional species combinations would broaden the scope of the findings, the choice to compare mouse and zebrafish was not solely based on availability. Rather, it was motivated by the opportunity to examine sperm guidance across two evolutionary distant vertebrates. This contrast allows us to seek for potential conservation of structural or molecular cues involved in gamete interaction. Additionally, both zebrafish and mouse offer extensive gene editing, blotting and imaging reagents, which are particularly valuable should future studies aim to identify and functionally disrupt genes encoding micropyle-associated proteins and their putative orthologs in mammals.

      For the CatSper experiment, I would suggest that the authors repeat this experiment with another mouse sperm mutant that is known to have reduced/altered motility. With the current data, I do not believe the failure to find/enter the micropyle is necessarily CatSper-specific. Because we do not know what the sperm interacts with in the micropyle or what the MP interacts with on the sperm, the signaling pathway cannot be tested, making other controls necessary for these results to be meaningful.

      Thank you for highlighting this important point. A wide range of mouse models with sperm motility defects exhibit subfertility or infertility due to structural abnormalities in the axoneme or midpiece rigidity. (Miyata et al., 2024). These defects often result in impaired progressive motility, failure to reach the zona pellucida, or inability to bind or penetrate it. In contrast, we could test and validate that CatSper1<sup>Null</sup> sperm display preserved early progressive motility but fail to transition into hyperactivated motility, making them particularly well suited for specifically assessing the role of hyperactivation in sperm navigation toward and entry into the micropyle. Taken together, these points, along with those discussed in our response to the public review, led us to conclude that the CatSper1<sup>Null</sup> model provides the most biologically relevant context currently available to assess the role of hyperactivation in guiding sperm to the micropyle.

      The authors could greatly strengthen the discussion by addressing the key points I raised in the public review, particularly in terms of interpreting these results in the context of each species' physiological mode of fertilization.

      We thank the reviewer for this important recommendation. We have carefully revised the Discussion to address the key points raised in the public review, particularly by framing our findings within the context of the distinct physiological modes of fertilization in each species, as indicated n our answers to the public review. We hope these additions have strengthened the manuscript as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Sumary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and reviewed our primary sources to clarify the trait classifications. We reclassified the species according to the expertise of this reviewer and perform our analysis again; please see details below.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, although they have still been tested (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we added a reference to the methods section to clarify this (see details below).

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We replaced the terms specialist and generalist with specific predictions based on traits (see details below).

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We have reviewed the text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers (see details below).

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We revised the discussion to acknowledge potential differences in outcomes (please see details below).

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. We have updated the figure and figure caption (please see details below).

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We revised the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them (see details below). We carefully reviewed all figures and captions, and made changes to improve the clarity of the text and the presentation of results (see details below).

      Reviewer #1 (Recommendations for the authors):

      Comment:

      (1) Following weakness #1 in the public review, the authors should review the habitat classifications, consult with an odonatologist, and reclassify many species from Both to Lentic and redo the analysis.

      Thank you for pointing out this disagreement among expert habitat classifications that we cited and other literature. We reclassified species’ habitat preferences based on classifications by Hof et al., a source that was consistent with your suggestions, and identified additional species as Lentic that our other references had identified as Both. We performed our analysis with this new dataset and, as you suspected, our results did not change qualitatively: species habitat preferences did not predict their range shifts.

      Hof, Christian, Martin Brändle, and Roland Brandl. "Lentic odonates have larger and more northern ranges than lotic species." Journal of Biogeography 33.1 (2006): 63-70.

      Comment:

      (2) Following weakness #2, would it be worthwhile or interesting to analyze a smaller ranging group (e.g. cut the quad size in half, 50 x 50 km) to bring in more species and potentially change the inference? Or is the paper too tightly constructed to allow this, even as a secondary piece?

      Thank you for this comment, as it highlights an important consideration for macroecological analyses, and the importance of balancing multiple factors for determining quadrat size. Issues exist with identifying drivers of range boundaries among species with narrow ranges when they are analyzed separately from wide-ranging species, and examining larger quadrats can actually help clarify drivers (Szabo, Algar, and Kerr 2009). The smaller quadrats are, the higher the likelihood that the species is actually there but was never observed, or that the quadrat only covers unsuitable habitat and the species is absent from the entire (or almost entire) quadrat. Too many absences creates issues with violating model assumptions, and creates noise that makes it difficult to identify drivers of species’ range and phenology shifts.

      Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground”, and we have included a brief explanation of this in the text: “We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.”  (Lines 170-172).

      Szabo, Nora D., Adam C. Algar, and Jeremy T. Kerr. "Reconciling topographic and climatic effects on widespread and range‐restricted species richness." Global Ecology and Biogeography 18.6 (2009): 735-744.

      Comment:

      (3) Following weakness #3, are specialists the ones that "failed to shift" (L18)? If so please specify. The prediction about generalists vs specialists needs to be removed or incorporated in other parts of the paper.

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (4) Following weakness #4, cite Pinkert et al at lines 70-73 and Rocha-Ortega et al at lines 73-77 along with https://doi.org/10.1098/rspb.2019.2645. Add Sandall et al https:// doi.org/10.1111/jbi.14457 to L69 references.

      Thank you for the excellent reference suggestions, we have added them as suggested (Lines 80, 86, 77).

      Comment:

      Other comments/suggestions:

      (1) Title: consider adding temp variability 'Range geography and temperature variability, not functional traits,...'.

      Thank you for this suggestion, we have added temperature variability to the title: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”.

      Comment:

      (2) L125: is (northern) Mexico included in North America?

      Yes, we did include observations from Northern Mexico, and have specified this in the text: “We retained ~1,100,000 records from Canada, the United States, and Northern Mexico, comprising 76 species (Figure 2).” (Lines 174-176).

      Comment:

      (3) L128: I'd label this section 'Temperature variability' rather than 'Climate data'.

      Thank you, we agree that this is a more appropriate title for this section, and have replaced ‘Climate data’ with ‘Temperature variability’ (Line 185).

      Comment:

      (4) Table 2: why are there no estimates for the traits?

      We apologise, this information should have been included in the main body of the manuscript, but was only explained in the Table 2 caption. We have added the following explanation: “Non-significant variables, specifically all functional traits, were excluded from the final models.”. (Line 312-323).

      Comment:

      (5) Figure 2: need to identify the A-D panels.

      We apologise for this error and have clarified the differences between panels in the figure caption:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (6) L163-173: I am not familiar with this analysis but it sounds interesting and promising, I am not sure if this can be clarified further. Why the -25 to 25, and -30 to 30, doesn't the -35 to 35 cover these? And what is meant by "include only phenology shifts that could be biologically meaningful", that larger shifts would not be meaningful or tied to climate change?

      We used different cutoffs for phenology shifts to inspect for outliers that were likely to be errors, potentially do to insufficient sampling to calculate phenology. We clarified in the text as follows:

      “We retained emergence estimates between March 1st and September 1st, as well as species and quadrats that showed a difference in emergence phenology of -25 to 25 days, -30 to 30 days, or -35 to 35 days between both time periods, to include only phenology shifts that could be biologically meaningful to environmental climate change (i.e. exclude errors).” (Lines 169-173).

      Comment:

      (7) L193-200: I agree but would make a distinction between ecological vs functional traits, as other studies view geographic traits as ecological manifestations of functional biology, e.g. https://doi.org/10.1016/j.biocon.2019.07.001 and https://doi.org/10.1016/ j.biocon.2023.110098.

      Thank you for this suggestion, and for making us aware of the thinking around range geographies as ecological traits. We have specified throughout the manuscript that the ‘traits’ we are considering are ‘functional traits’, changed the methods subsection title to “Range geographies and functional traits” (Line 252), and added a brief discussion of ecological traits: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) L203: What's the rationale for egg-laying habitat as "biologically relevant to spatial and temporal responses to climate change"? That one's not as obvious as the others and needs a sentence more. Also, I am wondering why other traits were not considered here, like color lightness and voltinism. And why not wing size instead of body size, or better yet the two combined (wing loading) as a proxy for dispersal ability?

      We agree that our rationale for using this trait should be better explained, and we have included the following explanation: “Egg laying habitat was assigned according to whether species use exophytic egg-laying habitat (i.e. eggs laid in water or on land, relatively larger in number), or endophytic egg-laying habitat (i.e. eggs laid inside plants, usually fewer in number); species using exophytic habitats are associated with greater northward range limit shifts (Angert et al., 2011).” (Lines 271-275).

      We considered traits that have been found to be important for range and phenology shifts among odonates, as well as being key traits for expectations for species responses to climate change. Flight duration and body size are correlated with dispersal ability (Powney et al. 2015). Body size is also correlated with competitive ability (Powney et al. 2015), potentially making it an important predictor of a species’ ability to establish and maintain populations in expanding range areas. Traits correlated with range shifts also include breeding habitat type (Powney et al. 2015; Bowler et al. 2021) and egg laying habitat (Angert et al. 2011). Ideally, we would have used dispersal data from mark/release/recapture studies, but it was not available for many of the species included in this study. After finding that none of the functional traits we included were related to range shifts, there was no reason to believe that a further investigation of traits would be meaningful.

      Angert AL, Crozier LG, Rissler LJ, Gilman SE, Tewksbury JJ, Chunco AJ. 2011. Do species’ traits predict recent shifts at expanding range edges? Ecology Letters 14:677–689. doi:10.1111/j.1461-0248.2011.01620.x

      Bowler DE, Eichenberg D, Conze K-J, Suhling F, Baumann K, Benken T, Bönsel A, Bittner T, Drews A, Günther A, Isaac NJB, Petzold F, Seyring M, Spengler T, Trockur B, Willigalla C, Bruelheide H, Jansen F, Bonn A. 2021. Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany.Diversity and Distributions 27:1353–1366. doi:10.1111/ddi.13274

      Powney GD, Cham SSA, Smallshire D, Isaac NJB. 2015. Trait correlates of distribution trends in the Odonata ofBritain and Ireland. PeerJ 3:e1410. doi:10.7717/peerj.1410

      Comment:

      (9) L210: I count at least 5 migratory species in table S3, so although maybe not enough to analyze it's misleading to say "nearly all" were non-migratory, revise to "most" or "vast majority".

      Thank you for pointing this out, we have made the suggested correction (Line 277).

      Comment:

      (10) L252-254: save this for the Discussion and write a more generalized statement for results to avoid citations in the results.

      Thank you for this suggestion, we have moved this to the discussion (Lines 517-527).

      Comment:

      (11) Figures S5 & S6: these are pretty important, I'd consider elevating them to the main document as one figure with two panels.

      Thank you for this suggestion, we agree these figures should be elevated to the main text, and have made them into a panel figure (Figure 4).

      Comment:

      (12) L305-307: great point and recommendation!

      Thank you very much for this positive feedback!

      Comment:

      (13) L335-336: another place to cite https://doi.org/10.1098/rspb.2019.2645 which includes a thermal sensitivity index and would add an odonate citation behind the statement.

      Thank you for this excellent suggestion, we have added this citation (line 480). (Rocha-Ortega et al. 2020)

      Comment:

      (14) L352-353: again see also https://doi.org/10.1098/rspb.2019.2645.

      Thank you for highlighting this reference, we have added it to Line 505 as suggested.

      Comment:

      (15) L355: revise "populations that coexist" to "species that co-occur" (big difference between population and species levels and between coexistence and co-occurrence).

      Thank you very much for pointing this out, we have made the suggested change (Line 507).

      Comment:

      (16) L359-365: are the winners and losers depicted in Figures S5 & S6? If so reference the figure (which I suggest combining and promoting to the main text), if not create a table listing the analyzed species and their winner/loser status.

      We agree that this is an excellent place to bring up Figures S5 and S6 from the supplemental. We have moved them to the main document as one figure and referenced it at line 510.

      Reviewer #2 (Recommendations for the authors):

      Comment:

      (1) Line 53-55: The claim that "These relationships generalize poorly taxonomically and geographically" is valid, but the study only tests Odonata on two continents.

      Thank you for this comment – the word ‘generalize’ may imply that our study tries to find a general pattern across many groups. We have changed the language to: “However, these relationships are inconsistent across taxa and regions, and cross-continental tests have not been attempted (Angert et al., 2011; Buckley and Kingsolver, 2012; Estrada et al., 2016; MacLean and Beissinger, 2017).” (Lines 57-59).

      Comment:

      (2) Line 58-59: Is this statement only true for Odonata? It does not seem to hold for plants, for example.

      Thank you for this comment – this statement references a meta-analysis of multiple animal and plant taxa, but the evidence for the importance of range location comes from animal taxa. We have specified that we are referring to animal species to clarify (Line 60).

      Comment:

      (3) Line 87-91: This section is difficult to understand and needs clarification.

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121).

      Comment:

      (4) Line 99-100: Please define "generalist" and "specialist" more clearly here (e.g., based on climate niche?).

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (5) Line 122: Replace the English letter "X" in "100x100 km" with the correct mathematical symbol.

      We have made the suggested replacement throughout the manuscript.

      Comment:

      (6) Line 148: To address sampling effects, you could check the paper: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15524. Additionally, maximum and minimum values are sensitive to extreme data points, so using 95% percentiles might be more robust.

      Thank you for sharing this paper, as it offers a valuable perspective on the study of species’ ranges. While our dataset is substantially composed of observations from adult sampling protocols, unlike the suggested paper which compares adults and juveniles, this is an interesting alternative approach.

      For our purposes it is meaningful to include outliers, as otherwise we may have missed individuals at the leading edge of range expansions. Our intent here was to detect range limits, as opposed to finding the central tendency of species distributions. This approach is widely accepted in the macroecology literature (i.e. Devictor et al., 2012, 2008; Kerr et al. 2015).

      We have included the following discussion of our approach in the methods section:

      “We followed widely accepted methods to determine species range boundaries (Devictor et al., 2012, 2008; Kerr et al., 2015), although other methods exist that are appropriate for different data types and research questions i.e. (Ni and Vellend, 2021). We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.” (Lines 168-173).

      Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, Rasmont P, Schweiger O, Colla SR, Richardson LL,Wagner DL, Gall LF, Sikes DS, Pantoja A. 2015. Climate change impacts on bumblebees converge across continents. Science 349:177–180. doi:10.1126/science.aaa7031

      Soroye P, Newbold T, Kerr J. 2020. Climate change contributes to widespread declines among bumble bees across continents. Science 367:685–688. doi:10.1126/science.aax8591

      Devictor V, Julliard R, Couvet D, Jiguet F. 2008. Birds are tracking climate warming, but not fast enough.Proceedings of the Royal Society B: Biological Sciences 275:2743–2748. doi:10.1098/rspb.2008.0878

      Devictor V, van Swaay C, Brereton T, Brotons L, Chamberlain D, Heliölä J, Herrando S, Julliard R, Kuussaari M,Lindström Å, Reif J, Roy DB, Schweiger O, Settele J, Stefanescu C, Van Strien A, Van Turnhout C,

      Vermouzek Z, WallisDeVries M, Wynhoff I, Jiguet F. 2012. Differences in the climatic debts of birds and butterflies at a continental scale. Nature Clim Change 2:121–124. doi:10.1038/nclimate1347

      Comment:

      (7) Line 195: The species' climate niche should also be considered a product of evolution.

      Thank you for this suggestion. To address this comment and a comment from another reviewer, we changed the text to the following: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) Line 244: This speculative statement belongs in the Discussion section.

      Thank you for this suggestion, we have moved this statement to the discussion (Lines 451-453).

      Comment:

      (9) Line 252-254: The projection of Coenagrion mercuriale's range contraction is not part of your results and should be clarified or removed.

      Following this suggestion and a similar suggestion from another reviewer, we moved this text to the discussion (Line 517-527).

      Comment:

      (10) Line 314-316: If the species can tolerate warmer temperatures better, why would they migrate?

      We apologize for the confusion, and we have reworded the section as follows: “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (11) Line 334-335: Species' tolerance to temperature likely depends on their traits, which were not tested in this study. This should be noted.

      We agree, and we have removed the wording “rather than traits” from this sentence (Line 479).

      Reviewer #3 (Recommendations for the authors):

      Comment:

      (1) Title: The title is too general not specifying that your results are on odonates only, but also stressing the implicit role of climate change to a degree the tests do not support.

      Following this comment and a suggestion from another reviewer we changed the title to the following: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”. We wanted to emphasize our use of Odonates as a model species that we used to ask broad questions, while being more specific about the climatic variable that we examined (temperature variability).

      Comment:

      (2) L32: consider including Novella-Fernandez et al. 2023 (NatCommun) which addresses this topic in Odonates.

      Thank you for suggesting this very interesting paper, we have added it as a citation (Line 31-32).

      Comment:

      (3) L35: consider including Grewe et al. 2013 (GEB) and Engelhardt et al. 2022(GCB).

      Thank you for these excellent suggestions, we have added the citations (Line 35).

      Comment:

      (4) L47: rather write 'result from' instead of 'driven by'.

      We agree this is a better characterization and have corrected the wording (Line 48-49).

      Comment:

      (5) L49-52: There has been a recent study on this topic for birds (Neate-Clegg et al., 2024 NEE). However, specifying this to insects would make it not less relevant. This review for odonates might be helpful in this regard (Pinkert et al.. 2022, Chapter: "Odonata as focal taxa for biological responses to climate change" IN Dragonflies & Damselflies: Córdoba-Aguilar et al. (2022) Model Organisms for Ecological and Evolutionary Research.

      Thank you for again suggesting excellent references, we have added them to line 52-53, as well as adding the Pinkert citation to lines 61 and 82.

      Comment:

      (6) L53-66: Combine into one paragraph about drivers. With traits first and the environment second. The natural land cover perspective may be too complicated in this context. Consider focusing on generalities of the impact of changes within species' ranges.

      As suggested we have combined these into one paragraph about drivers (Line 59).

      Comment:

      (7) L67-69: The book from before would be a much stronger reference for this claim. Kalkmann et al (2018) do not address the emphasis of global change research in insects on bees and butterflies. Also, I would highlight that most of the current work is at a national scale, rather than cross-continental.

      Thank you for this suggestion, we have added the suggested reference and included that “…recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 75-77).

      Comment:

      (8) L68: consider rephrasing this part to '..provide a rare opportunity to investigate spatiotemporal biotic responses at larger taxonomic and spatial scales'

      We appreciate this suggestion and really like the wording. We have changed the phrase to read as follows: “While global change research on insects often emphasizes butterfly and bee taxa, recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 74-77).

      Comment:

      (9) L69: This characteristic is not unique to odonates and would hamper drawing general conclusions. Honestly, I think the detailed and comprehensive data on them is the selling point.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (10) L73: Indicator for what? The first part of the sentence would suggest lesser surrogacy for responses of other taxa. Reconsider this statement. They are well- established indicators for habitat intactness and freshwater biodiversity. Darwell et al. suggested their diversity can serve as a surrogate for the diversity of both terrestrial and aquatic taxa.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (11) L76: Fritz et al., is a study on mammals, not odonates.

      Thank you for pointing out this error, the reference has been removed (Line 84-85).

      Comment:

      (12) L84: Lotic habitats are generally better connected than lentic ones. Lentic species are considered to have a greater propensity for dispersal DUE to the lower inherent spatiotemporal stability (implying lower connectivity) compared to lotic habitats.

      Thank you for your comment, we have rewritten this section as follows: “For example, differences in habitat connectivity and dispersal ability may constrain range shifts for lentic species (those species that breed in slow moving water like lakes or ponds) and lotic species (those living in fast moving-water) in different ways (Kalkman et al., 2018). More southerly lentic species may expand their range boundaries more than lotic species, as species accustomed to ephemeral lentic habitats better dispersers (Grewe et al., 2013), yet lotic species have also been found to expand their ranges more often than lentic species, potentially due to the loss of lentic habitat in some areas (Bowler et al., 2021).” (Lines 88-95).

      Comment:

      (13) L90: I would be cautious with this interpretation. If only part of the range is considered (here a country in the northern Hemisphere) southern species are moving more of their range into and northern species more of their range out of the study area in response to warming (implying northward shifts).

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121)

      Comment:

      (14) L117: Odonata Central contains many county centroids as occurrence records. These could be an issue for your use case. I may have overlooked the steps you took to address this, but I think this requires at least more detail and possibly further removal/checks using for instance CoordinateCleaner. The functions implemented in this package allow you to filter records based on political units to avoid exactly this source of error.

      Thank you for this suggestion, we weren’t aware of this issue with Odonata Central. We used the CoordinaterCleaner tool in R to filter all odonate records that we used in our analyses. Less than 1% of observations in our dataset were identified as having potential problems by the tool, so we would not expect this to affect our inferences. However, in future we will employ this tool when using similar datasets.

      Comment:

      (15) L119: Please add a brief explanation of why this was necessary. I am ok with something along the lines in the supplement.

      We moved this information from the supplemental to the main text as follows: “If a species was found on both continents, we only retained observations from the continent that was the most densely sampled. If we merged data for one species found on both continents, we could not perform a cross-continental comparison. However, if the same species on different continents was treated as different species, this would lead to uninterpretable outcomes (and the creation of pseudo-replication) in the context of phylogenetic analyses. In addition, species found on both continents did not have sufficient data to meet criteria for the phenology analysis.” (Lines 161-167).

      Comment:

      (16) L132: This is the letters 'X' or 'x' are not multiplier symbols! Please change to the math symbol (×), everywhere.

      Thank you for pointing out this error, we have made the correction throughout the manuscript.

      Comment:

      (17) L133: add 'main' before 'flight period'

      Thank you for this suggestion, we have made the change. (Line 190)

      Comment:

      (18) L135: I suggest using the coefficient of variation, as it is controlled for the mean. Otherwise, what you see is partly the signature of temperature and not of its variation. For me, it's very difficult to understand what this variation of the variation means and at least needs more explanation.

      Thank you very much for this suggestion, we agree that using the coefficient of variation is a better fit for the question that we’re asking. We re-ran out analyses with the coefficient of variation as the measure of climate variability: all the results reported in the manuscript are now updated for that analysis (Line 377, Table 2), and we have also updated the methods section (Line 191). The results are qualitatively the same to our previous analysis, but we agree that they are now easier to interpret.            

      Comment:

      (19) L155: Please adequately reference all R packages (state the name, and a reference for them including the authors' names, title, and version).

      Thank you for pointing out this omission, we have added reference information for the glm function in base R (Line 298) and ensured all other packages are properly referenced.

      Comment:

      (20) L207: Mention the literature sources here (again).

      We agree that they should be referenced here again, and we have done so (Lines 267-268).

      Comment:

      (21) L209: You could use the number of grid cells as a proxy for range size.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (22) L218: It would be preferable to say 'species-level' instead of 'by-species'.

      Thank you for this suggestion, we agree that this is clearer and made the change (Line 298).

      Comment:

      (23) L219-220: this is unclear. Please rephrase.

      We have clarified as follows: “We used both species-level frequentist (GLM; glm function in R) and Bayesian (Markov Chain Monte Carlo generalized linear mixed model, MCMCglmm; Hadfield, 2010) models to improve the robustness of the results.” (Lines 298-300).

      Comment:

      (24) L224: At least for Europe there is a molecular phylogeny available, which you should preferably use (Pinkert et al. 2018, Ecography). Otherwise, I am ok with using what is available

      We apologize that the nature of the phylogeny that we used was not clear; the phylogeny that we used was built similarly to that in Pinkert et al. 2018, Ecography. It created a molecular phylogeny with a morphological/taxonomic tree as the backbone tree, so that species could only move within their named genera or families. We clarified this in the manuscript as follows:

      “We used the molecular phylogenetic tree published by the Odonate Phenotypic Database (Waller et al., 2019), which used a morphological and taxonomic phylogeny as the backbone tree, allowing species to move within their named genera or families according to molecular evidence (Waller and Svensson, 2017).” (Lines 302-305).

      Comment:

      (25) L233: You said so earlier (1st sentence of this paragraph).

      Thank you for pointing this out, we removed the repetitive sentence (Line 323).

      Comment:

      (26) L236-238: To me, it makes more sense to test this prior to fitting the phylogenetic models.

      MCMC-GLMM is considerably less familiar to most researchers than general linear models or there derivatives/descendants, such as PGLS. We report models both with and without phylogenetic relationships included for the sake of transparency, and we are happy to acknowledge that no interpretation here changes substantially relative to these decisions. However, failing to report models that included possible (if small) effects of phylogenetic relatedness might cause some readers to question what those models might have implied. For the moment, we are opting for the most transparent reporting approach here.

      Comment:

      (27) L241: Rather say directly XX of XX species in our data....

      (28) L245: Same here. Provide the actual numbers, please.

      Thank you for this suggestion, we made this change on Line 332 and Line 334.

      Comment:

      (29) L247-249: Then not necessary.

      This issue highlights a challenge in the global biology literature and around the issue of biodiversity monitoring for understanding global change impacts on species. Almost no studies have been able to report simultaneous range and phenology shifts, and the literature addresses these biotic responses to global change predominantly as distinct phenomena. Differences in numbers of species for which these observations exist, even among the extremely widely-observed odonates, seems to us to be a meaningful issue to report on. If the reviewer prefers that we abbreviate or remove this sentence, we are happy to do so.

      Comment:

      (30) L251:261: That is discussion as you interpret your results.

      Following your suggestion and the suggestion of another reviewer, we moved the following lines to the discussion section: “Species that did not shift their ranges northwards or advance their phenology included Coenagrion mercuriale, a European species that is listed as near threatened by the IUCN Red List (IUCN, 2021), and is projected to lose 68% of its range by 2035 (Jaeschke et al., 2013).” (Lines 517-527).

      Comment:

      (31) 252: Good to mention, but why is the discussion limited to C. mercurial?

      We feel that it is important to link the broad-scale results to the specific biological characteristics of individual species, and C. mercurial is an IUCN threatened species. We are happy to expand links to natural history of this group and have added the following: “This group also includes Coenagrion resolutum, a common North American damselfly (Swaegers et al., 2014), for which we could not find evidence of decline. This may be due in part to the greater area of intact habitat available in North American compared to Europe, enabling C. resolutum to maintain larger populations that are less vulnerable to stochastic climate events. Still, this and other species failing to shift in range or phenology should be assessed for population health, as this species could be carrying an unobserved extinction debt.” (Lines 527-533).

      Comment:

      (32) L264: Insert 'being' before 'consistently'.

      Thank you for the suggestion, we made this change (Line 373).

      Comment:

      (33) L271: .'. However,'.

      Thank you for pointing out this grammatical error, we have corrected it (Line 382).

      Comment:

      (34) L273: 'affected' instead of 'predicted'

      Thank you for the suggestion, we made this change (Line 383).

      Comment:

      (35) L279: 'despite pronounced recent warming' sounds not relevant in this context.

      Thank you for this suggestion, we removed this portion of the sentence (Line 408).

      Comment:

      (36) L281: Rather 'the model performance did not improve....'

      Thank you for the suggestion, we made this change (Line 409).

      Comment:

      (37) L288: Add 'but' before 'not'.

      Thank you for the suggestion, we made this change (Line 416).

      Comment:

      (38) L311-316: Reconsider the causality here. maybe rather rephrase to are associated instead. Greater dispersal ability and developmental plasticity might well lead to higher growth rates, rather than the other way around.

      We agree that plasticity/evolution at range edges is important to consider and have included it as an alternative explanation: “Adaptive evolution and plasticity may enable higher population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Line 449-451).  

      Comment:

      (39) L313-316: Maybe delete the second 'should be able to'.

      This phrase has been changed in response to other reviewer comments and now reads as follows:

      “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (40) L331: Limit this statement ending with 'in North American and European Odonata'.

      Thank you for this suggestion, we made this addition (Lines 475-476).

      Comment:

      (41) L346-347: There are too many of these more-research-is-needed statements in the discussion (at least three in the last paragraphs). Please consider finishing the paragraphs rather with a significance statement.

      Thank you for this suggestion, we have changed the final sentence here to the following: “The extent to which species’ traits actually determine rates of range and phenological shifts, rather than occasionally correlated with them, is worth considering further, but functional traits do not systematically drive patterns in these shifts among Odonates in North America and Europe.” (Lines 480-483).

      We also made additional changes, removing a ‘more-research is needed’ statement from the following paragraph (Line 443), as well as from line 499.

      Comment:

      (42) L349: See also Franke et al. (2022, Ecology and Evolution).

      Thank you for highlighting this excellent reference! We have added it to Line 501.

      Comment:

      (43) L363: Maybe a bit late in the text, but it is important to note that there is the third dimension 'abundance trends' or rather a common factor related to range and phenology shifts. I feel this fits better with the discussion of population growth.

      Thank you for this suggestion, we have addressed the importance of abundance trends in the following sentences: “Further mechanistic understanding of these processes requires abundance data.” (Lines 442-443); “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there are clear ‘winners’ and ‘losers’ under climate change.” (Lines 509-510).

      Comment:

      (44) L375-377: This last sentence is very similar to L371-373. Please reduce the redundancy. Focus more on specifically stating the process instead of vaguely saying 'new insights into patterns' and 'suggesting processes'. Rather, deliver a strong concluding message here.

      Thank you for this suggestion, we feel that we now have a much stronger concluding message: “By considering both the seasonal and range dynamics of species, emergent and convergent climate change responses across continents become clear for this well-studied group of predatory insects.” (Lines 545-547).

      Comment:

      (45) Table 1: To me, the few estimates presented here do not justify a table. rather include them in the text. OR combine them with Table 2. Also, why not include the traits as predictors (from the range shift models) in these models as well?

      We have clarified in the text that the results displayed in Table 1 are from the analysis of the relationship between range and phenology shifts: “The effect of species’ range shifts on phenology range shifts was significant in our model investigating the relationship between these responses, indicating that species shifting their northern range limits to higher latitudes also showed stronger advances in their emergence phenology (Figure 3).” (Lines 341-344).

      As there were no significant effects in the model of phenology change drivers, we have not shown results of this model: “Emergence phenology shifts were not affected by species’ traits, range geography, nor climate variability; due to this, model results are not displayed here.” (Lines 383-384).

      Comment:

      (46) Table 2: L712-713: What does this mean? Are phenology shifts not used as a predictor of range shifts? (why then this comment?). Or do you want to say phenological shifts are not related to Southern range etc? Why do you present a phylosig here but not in Table 1? Why not include the traits as predictors (from the range shift models) in these models as well? Consider using the range size as a continuous predictor instead of 'Widespread'.

      We are glad the reviewer pointed this out to us. We did not emphasize this issue sufficiently. We DID evaluate traits as predictors both of geographical range and phenological shifts, and species-specific biological traits did not significantly affect models predicting either of those sets of responses. We state this on Lines 312-323, but we have also noted in the discussion (Lines 473-476) that the most commonly assessed traits, like body size, do not alter observed trends here. Instead, where species are found, rather than the characteristics of species, is the key determinant of their overall responses.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (47) Figure 1: I don't see any grey points in the figure. Also, there is no A or B. If you are referring to the symbols then write cross and triangle instead and not use capital letters which usually refer to component plots of composite figures. Also, I highly recommend providing a similar figure based on your data (maybe each species as a dot for T1 and another symbol for T2). Given the small number of species, you could try to connect these points with arrows. For the set with only range shifts maybe play the T2-dots at the center of the 'Emergence' axis.

      Thank you for pointing out this error: a previous version of Figure 1 included grey points and multiple panels. We have removed this text from the figure caption to be consistent with the final version of the figure (Line 989).

      The graphical depictions of the conceptual and empirical discoveries in this paper were challenging to create. The reviewer might be suggesting effectively decomposing Figure 3 (change in range on the y axis vs change in phenology among all species into two sets of points on the same graph, where each pair of points is a before and after value for each species. This would make for a very busy figure indeed. We have modified the conceptual Figure 1 to illustrate more clearly, we believe, that species can (in principle) remain within tolerable niche spaces by shifting their activity periods in time (phenology) or in space (geographical range) or both.

      Comment:

      (48) Figure 2: Please add a legend. Also black is a poor background color. The maps appear to be stretched. Please check aspect ratios. Now here are capital letters without an explanation in the caption. From the context I assume the upper panel maps are for the data used to calculate range shifts at the bottom panel maps are for data used to calculate the phenological shifts.

      We apologise for the error in the figure caption and have clarified the differences between panels in the text, as well as changing the map background colour and fixing the aspect ratio:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (49) Figure 3: Why this citation? Of terrestrial taxa? Please explain. Consider adding some stats here, such as the r-squared value for each of the relationships.

      We have better explained the citation in the figure caption, as well as adding r-squared values:

      “Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).” (Lines 679-682)

      Comment:

      (50) L801: What are these underscored references?

      This was an issue with the reference software and has been resolved.

      Comment:

      (51) Table S1: L848: Consider starting with 'Samples of 76 North American and European odonate species from between ...'. Please use a horizontal line to separate the content from the table header. Add a horizontal line below the last row. Same for all tables.

      Thank you for this suggestion, we have edited the caption for Figure S1 as suggested (Line 1124). We have also made the suggested line additions to Table S1, S2, and S3.

      Comment:

      (52) Table S3: This is confusing. In Table 1 (main text) both 'southern range' and 'widespread' are used as predictors. Please explain.

      We originally included information on species range geography, including southern versus northern range, and widespread versus not, into one categorical variable. Following additional comments we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Now the methods section text (Lines 261-263) and Table 1 report results of that variable with distribution options northern, southern, or both. 

      Comment:

      (53) Figure S5 and S6: It would be more coherent if the colors refer to the continents and the suborders are indicated by shading. I would love to see a combination of the two figures with species ordered by the phylogenetic relationship and a dot matrix indicating the traits in the main text! This could really be a good starting point for a synthesis figure.

      The reviewer presents an interesting challenge for us. We have a choice, as we understand things, to present a figure showing phylogeny and traits (as requested here), or an ordered list of species relative to effect sizes in the two main responses to global change. The latter choice centers on the discoveries of the paper, while the former would be valuable for dragonfly biology but would depict information that proved to be biologically uninformative relative to our discovery. That is to say, there is no phylogenetic trend and biological traits among species did not affect results. We have gone some way toward illustrating that issue by retaining phylogeny in the MCMC-GLMM models, but we feel that a figure illustrating phylogeny and traits would (for most readers, at least) illustrate noise, rather than signal. For this reason, we have opted to take on the previous reviewer’s suggestion for a modified, main-text Figure 4, which we include below.

      Figure 4: Distribution of Northern range limit shifts (Panel A, kilometers) and emergence phenology shift (Panel B, Julian day) of 76 European and North American odonate species between a recent time period (2008 - 2018) and a historical time period (1980 - 2002). Anisoptera (dragonflies) are shown in pink, Zygoptera (damselflies) are shown in blue.

      Change last: Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      We thank the reviewer for the suggestion. We supplemented the membrane potential regression loss with errors computed for 3 intervals: pre- post- and mid- stimulation time intervals, improving the accuracy of EP-GAN for baseline membrane potential responses (Figure 2, 3, Table S2, S3). We also changed the simulation protocols for generated parameters by allowing a longer simulation time of 15 seconds, where the stimulation is applied during [5, 10] seconds and no stimulation at t = [0, 5) (pre-stimulation) and t = (10, 15] (post-stimulation). These time intervals are chosen to ensure sufficient stabilization periods before and after stimulation.  

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      We thank the reviewer for the clarification on inverse gradient operation terminology. In the Methods section, we changed the term describing the inverse gradient operation to ‘forward integration’ which is a more accurate description describing the process.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

      We supplemented the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size consistent with literature. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios.  

      Reviewer #2 (Public review):

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EPGAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EPGAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      We made improvements to the training data generation and architecture of EP-GAN to improve its overall accuracy with predicted membrane potential responses. In particular, we divided training data generation into three neuron types found in C. elegans non-spiking neurons: 1) Transient outward rectifier, 2) Outward rectifier and 3) Bistable [8, 16]. Each randomly generated training sample is categorized into one of 3 types by evaluating its steady-state currents with respect to experimental dI/dV bound constraints (See generating training data section under Methods for more detail). The process is then followed by imposing minimum-maximum constraints on simulated membrane potential responses. The setup allows generations of training samples that are of closer distribution to experimentally recorded neurons. This is further described in Section Methods page 15 in the revised manuscript.

      We also improved the EP-GAN training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol (see Methods page 13 for more detail). For the training loss functions, we further supplemented the membrane potential regression loss with errors computed for 2 intervals: pre- and post-stimulation time intervals to improve EP-GAN prediction capabilities for baseline membrane potentials.

      Taken together, these modifications improved EP-GAN’s overall ability to better capture empirical membrane potential responses and we show the results in Figure 2 – 5, Table S2, S3.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      We thank the reviewer for the feedback regarding the comparison with existing methods. We have revised the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). Incorporating this process has improved the accuracy of existing methods especially for small HH-model scenarios where DEMO stood out with the best performance alongside NSGA2 (Figure 5, Table 1, 2).

      We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios. 

      In particular, with extended membrane potential error including pre-, mid- , post-activation periods, EP-GAN (trained with 32k samples, large HH-model, 9 neurons) mean membrane potential responses error of 2.82mV was lower than that of DEMO (12.2mV, 64k samples) trained on identical setup (Table 2) and DEMO (7.78mV, using 36,000k samples, 3 neurons) applied to simpler HHmodel in [16]. With respect to DEMO performance in [16], under identical simulation protocol (i.e., no stimulation during (0, 5s), (10, 15s) and stimulation during (5, 10s)), EP-GAN predicted RIM (large HH-model) showed membrane potential accuracy on par with that of DEMO (simpler HH-model) and EP-GAN predicted AFD showed better accuracy for post-activation membrane potential response where DEMO predicted membrane potentials overshoot above the baseline (not shown in the paper).

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      We thank the reviewer for the feedback regarding the paper's scope. With revised methods, the overall quality of EP-GAN models is improved with the most significant improvements in baseline membrane potential accuracy. While high quality neuron models could be attained with existing methods given sufficient sample size, our results suggest EP-GAN can predict models with enhanced quality with significantly fewer sample size without a need for retraining, thus complementing the main drawback of evolutionary based methods. While EP-GAN still has limitations (e.g., difficulty in predicting slow ramps) that need to be addressed in the future, we believe its overall performance combined with fast inference speed and flexibility in its input data format (e.g., missing membrane potential traces) is a step forward in the large-scale neuron modeling tasks that can contribute to network models.   

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

      We improved EP-GAN’s training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol.

      Such input masking during training has improved the results with ablation studies where EP-GAN now retains baseline membrane potential error (3.3mV, averaged across pre-, mid-, post-activation periods) up to 50% of membrane potential inputs remaining (3.5mV) and up to 25% of steady-state currents remaining (3.5mV).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. Author response:

      We would like to thank the reviewers for their helpful comments and critique of our manuscript. We plan to make the following revisions, which will improve the clarity of our manuscript and the robustness of our findings.

      We will revise methodological details and interpretation throughout the manuscript. In particular, we will consider alternative methods for calculating surrogates. We intend to investigate the relationship between apnoea rate and phase-amplitude coupling at other electrodes as suggested by Reviewer 1, and we will revise the details of the linear-mixed effects models.

      In relation to the comments raised by both Reviewers 2 and 3, we will carefully address the wording throughout the manuscript, including addressing the order of hypotheses, our interpretation of the directionality of the relationship between cortical and respiratory activity, and the connection between cortical-respiratory coupling and apnoea. We will further clarify the limitations of our recording setup and approach, in particular the limited EEG montage, and add further details with regards to sleep state and caffeine.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how β-glucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We agree that further epigenetic profiling—such as ATAC-seq analysis on HSCs or monocytes—would provide additional mechanistic depth to our current findings. We will acknowledge this as a limitation of the present study and highlight it as an important direction for future research.

      Comment (1): It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We agree that a visual summary will enhance the clarity and accessibility of our findings. We will add a new schematic diagram (Figure 6) illustrating the proposed mechanism of β-glucan–induced myeloid reprogramming and its protective effects in the experimental colitis model.

      Comment (2): Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target effects of β-glucan pretreatment. As trained immunity is known to amplify inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We will expand the Discussion section to include a dedicated paragraph addressing these potential off-target effects.

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for the reviewer’s positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, anti-inflammatory TI program is proposed.

      We appreciate the reviewer’s valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript.

      To strengthen the mechanistic component, we plan to: 1. Reanalyze relevant public datasets, focusing on pathways related to reparative and antibacterial function. 2. Perform monocyte ATAC-seq in our current model to validate the epigenetic changes in these pathways.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation.

      Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). The results from this group indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1⁺ macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, we lack direct depletion experiments due to the unavailability of effective depletion antibodies for this subset.

      We acknowledge this as a limitation and will clarify in the Discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence.

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 ), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate the reviewer’s valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. We will explicitly acknowledge in the Discussion that Rag1⁻/⁻ mice retain ILCs (including ILC3s) and that BG-induced activation of these cells remains possible.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded.

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      We thank the reviewer for their positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We will correct this description in the revised manuscript to accurately reflect our analysis workflow.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this study, Takagi and colleagues demonstrate that changes in axonal arborization of the segmental wave motor command neurons are sufficient to change behavioral motor output.

      The authors identify the Wnt receptors DFz2 and DFz4 and the ligand Wnt4 as modulators of stereotypic segmental arborization patterns of segmental wave neurons along the anterior-posterior body axis. Based on both embryonic expression pattern analysis and genetic manipulation of the signaling components in wave neurons (receptors) and the neuropil (Wnt4) the authors convincingly demonstrate that Wnt4 acts as a repulsive ligand for DFz2 that restricts posterior axon guidance of both anterior and posterior wave neurons. They also provide the first evidence that Wnt4 potentially acts as an attractive ligand for Df4 to promote the posterior extension of p-wave neurons. Interestingly, artificial optogenetic activation of all wave neurons that normally induces backward locomotion due to the activity of anterior wave neurons, fails to induce backward locomotion in a DFz2 knockdown condition with altered axonal extensions of all wave neurons towards posterior segments. In addition, the authors now observe enhanced fast-forward locomotion, a feature normally induced by posterior wave neurons. Consistent with these findings, they observe that the natural response to an anterior tactile stimulus is similarly altered in DFz2 knockdown animals. The animals respond with less backward movement and increased fast forward motion. These results suggest that alterations in the innervation pattern of wave motor command neurons are sufficient to switch behavioral response programs.

      Strengths

      The authors convincingly demonstrate the importance of Wnt signaling for anteriorposterior axon guidance of a single class of motor command neurons in the larval CNS. The demonstration that alteration of the expression level of a single axon guidance receptor is sufficient to not only alter the innervation pattern but to significantly modify the behavioral response program of the animal provides a potential entry point to understanding behavioral adaptations during evolution.

      Weaknesses

      While the authors demonstrate an alteration of the behavioral response to a natural tactile stimulus the observed effects, a reduction of backward motion and increased fast-foward locomotion, currently cannot be directly correlated to the morphological alterations observed in the single-neuron analyses. The authors do not report any loss of innervation in the "normal" target region but only a small additional innervation of more posterior regions. An analysis of synaptic connectivity and/or a more detailed morphological analysis that is supported by a larger number of analyzed neurons both in control and experimental animals would further strengthen the confidence of the study. As the authors suggest an alteration of the command circuitry, a direct observation of the downstream activation pattern in response to selective optogenetic stimulation of anterior wave neurons would further strengthen their claims (analogous to Takagi et al., 2017, Figure 4).

      We sincerely thank the reviewer for their insightful comments, which were instrumental in improving our manuscript. In response to the reviewers’ suggestion, we have now studied Brp expression and demonstrate that the ectopically extending Wave axons in the posterior region do contain synapses (new Figure 2). This finding supports the idea that these axons are functionally connected to ectopic downstream circuits. 

      Additionally, we have increased the number of analyzed Wave clones in Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G) to strengthen the morphological analyses. We fully agree with the reviewer that “direct observation of the downstream activation pattern in response to selective optogenetic stimulation” would further reinforce our conclusions. However, this was not feasible in the current study since we found that the Wave-Gal4 driver used in this study, which drives expression during embryonic stages, does not drive sufficiently strong expression in the larvae to enable selective optogenetic stimulation (please see below for details). 

      Reviewer #2 (Public Review):

      Summary:

      The authors previously demonstrated that anterior-located a-Wave neurons (neuromeres A1-A3) extend axons anteriorly to connect to circuits inducing backward locomotion, while p-Wave axon (neuromeres A4-A7) project posteriorly to promote forward locomotion in Drosophila larvae. In the manuscript, the authors aim to determine the molecular mechanisms involved in wiring the segmentally homologous Wave neurons distinctively and thus are functionally different in modulating forward or backward locomotion. The genetic screen focused on Wnt/Fz-signaling due to its known anterior-to-posterior guidance roles in mammals and nematodes.

      Strengths:

      Knock-down (KD) DFz2 with two independent RNAi-lines caused ectopic posterior axon and dendrite extension for all a- and p-Wave neurons, with a-Wave axon extending into regions where p-Wave axons normally project. Both behavioral assays (optogenetic stimulation of all Wave neurons or tactile stimuli on heads using a von Frey filament) show that backward movement is reduced or absent and that the speed of evoked fast-forward locomotion is increased. This demonstrates that altered projections of Wave do alter behavior and the DFz2 KD phenotype is consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits.

      The main conclusion, that Wnt/Fz-signaling is essential for the guidance of Wave neurons and in diversifying their protection pattern in a segment-specific manner, is further supported by the results showing that DFz2 gain of function causes shortening of a-Wave but not p-Wave axon extensions towards the posterior end and that KD of DFz4 causes axonal shortening only in A6-p-Wave neurons but does not affect dendrites or processes of other Wave neurons. A role for ligand Wnt4 is demonstrated by results indicating that WNT4 mutants' posterior extension of aWave axons was elongated similar to DFz2 KD animals and p-Wave axon extension towards the posterior end was shortened similar to DFz2 KD animals. Finally, a DWnt4 gradient decreasing from the posterior (A8) to the anterior end (A2), similar to that described in other species, is supported by analyses of DWnt4 gene expression (using Wnt4 Trojan-Gal4) and protein expression (using antibodies). In contrast, DFz2 receptor levels seemed to decrease from the anterior (A2) to the posterior end (A5/6). Together the results support the conclusion that opposing Wnt/Fz ligand-receptor gradients contribute to the diversification of Wave neurons in a location-dependent manner and that DFz2 and DFz4 have opposing effects on axon extension.

      Weaknesses:

      Wave axon and dendrite projections are not exclusively determined by Wnt4, DFz2, and DFz4, and are likely to involve other Fz receptors, Wt ligands, and other types of receptor-ligand signaling pathways. This is in part supported by the fact that Wnt4 loss of function also resulted in phenotypes that do not mimic DFz2 KD or DFz4 KD (Figures 3D, E, and F) and that other Fz/Wnt mutants caused wave neuron phenotypes (Figure 1-supplement 2, D+E). This is not a weakness per se, since it doesn't affect the main conclusion of the manuscript. However, the description and analyses of the data in particular for Figure 1-supplement 2 D should be clarified in the legend. The number within the bars and the asterisks are not defined. It's presumed they refer to numbers of animals assessed and the asterisk next to DFz2 and DFz4 indicate statistically significant differences. However, only one p-value is provided in the legend. It is also unclear if p-values for the other mutants have not been determined or are non-significant. At least for mutants like Corin, which also exhibit altered axon projections, the p-values should be provided.

      We appreciate this reviewer’s careful attention to detail and intellectual curiosity. We apologize for the confusions caused by the statistical reporting in Figure 1 – figure supplement 2D. The numbers shown in the bars represent the number of neurons (i.e. Wave neurons from left or right hemisphere). As mentioned in Materials and Methods section, we applied Chi-square test followed by Haberman's adjusted residual analysis to determine the statistical significance of each RNAi group. The p-value provided in the figure legend corresponds to the Chi-square test. P-values for Haberman's adjusted residual analysis were calculated for all RNAi groups and groups without the asterisk are not statistically significant. We have clarified these points in the corresponding figure legend.

      Figure 4 D, F. The gradient for Wnt4 was determined by comparison of expression levels of other segments to A8 but the gradient for DFz2 was by comparison to A2 and the data supports opposing gradients. However, for DFz2 (Figure 4, F) it seems that the gradient is bi-directional with the lowest being in A5 and increasing towards A2 as well as A8. Analysis should be performed in reference to A8 as well to determine if it is indeed bi-directional. While such a finding would not affect the interpretation of aWave neurons, it may impact conclusions about p-Wave neuron projections.

      We thank the reviewer for highlighting this interesting possibility. In response, we performed an additional analysis of the DFz2 gradient by comparing the signal from each neuromere to that from A8 (new Figure 5—figure supplement 3). This analysis confirmed that the gradient is indeed bidirectional. We revised the description of DFz2 expression accordingly in the revision. We believe this finding does not affect our main conclusions since only the anterior gradient is relevant for a-Wave axon guidance. 

      As discussed above, the DFz2 KD phenotypes are consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits. However, since the axon and dendrites of a-Wave and p-Wave are affected the actual dendritic and axonal contributions for the altered behavior remain elusive. The authors certainly considered a potential contribution of altered dendrite projection of a-Wave neurons to the phenotype and their conclusion that altered axonal projections are involved is supported by the optogenetic experiment "bypassing" sensory input (albeit it seems unlikely that all Wave neurons are activated simultaneously when perceiving natural stimuli).However, the author should also consider that altered perception and projection of pWave neuron may directly (e.g. extended P-wave axon projections increase forward locomotion input thereby overriding backward locomotion) or indirectly (e.g. feedback loops between forward and backward circuits) contribute to the altered behavioral phenotypes in both assays. It is probably noteworthy that the more complex behavioral alterations observed with mechanical stimulation are likely to also be caused by altered dendritic projections.

      We fully agree with the reviewer’s thoughtful interpretation. We have now included these important possibilities in the revised Discussion section. Specifically, we acknowledge that while the DFz2 knockdown phenotypes are consistent with aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits, the contributions of both axonal and dendritic alterations remain unclear. We also recognize that altered perception and projection of p-Wave neurons may directly or indirectly contribute to the observed behavioral phenotypes, particularly in response to mechanical stimulation.

      Presynaptic varicosities of a-Wave neurons in DFz2 KD animals are indicated by orange arrows in Figure 1. However, no presynaptic markers have been used to confirm actual ectopic synaptic connections. At least the authors should more clearly define what parameters they used to "visually" define potential presynaptic varicosities. Some arrows seem to point to more "globular structures" but for several others, it's unclear what they are pointing at.

      As mentioned in our response to Reviewer #1, we have now performed Brp immunostaining to confirm the presence of ectopic synaptic connections (new Figure 2). This analysis supports the interpretation that the presynaptic varicosities observed in DFz2 knockdown animals represent actual synaptic sites. We also clarified in the figure legend the visual criteria used to identify potential presynaptic varicosities.

      Reviewing Editor (Recommendations For The Authors):

      There are a few major concerns that we recommend the authors address:

      (1) Neuroanatomy: The point aberrant synaptic connectivity of a-Wave neurons following Dfz2 knockdown could be substantiated. This could be done by using a presynaptic marker and showing ectopic posterior presynaptic sites ( and/or reduced anterior presynaptic sites) in a-wave neurons.

      As mentioned in our response to the public review, we now have used Brp as a presynaptic marker to quantify the number and distribution of presynaptic sites along the normal and ectopic a-Wave axons (new Figure 2). We show that ectopic posterior Wave axons do contain presynaptic sites.  

      (2) Gradient calculations: As detailed in the reviews below, the Dfz2 gradient looks like it may be bidirectional. Changing the way the gradient is calculated might help address this point.

      As mentioned in our response above, we now have recalculated the gradient by comparing the DFz2 signal to A8 and show that it indeed is bidirectional (new Figure 5—figure supplement 2; formerly Figure 4—figure supplement 2).

      (3)  Statistics and sample sizes: As detailed in the reviews, some of the statistical reporting could be improved. Further, increasing sample sizes could help bolster confidence in the data as well.

      As mentioned above, we have added a description on the sample size, asterisks, and p-values in Figure 1 – figure supplement 2 legend. We also increased sample sizes of single Wave neurons in control and DFz2 knock-down animals (Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G)).

      (4) It would help to include some discussion of the potential contributions of altered p-wave neurons to the observed phenotypes.

      As described above, we have added in the Discussion potential contributions of altered p-wave neurons to the observed phenotypes. 

      Reviewer #1 (Recommendations For The Authors):

      (1) In the current model the authors assume that posterior elongation of a-wave neuron connectivity (axonal projections) induces a loss of connectivity to their natural targets, as backward motion is no longer induced, and a gain of connectivity to posterior wave neuron targets. Is this at the cost of innervation of p-wave neurons, e.g. did these neurons now lose connectivity to their natural targets as well? Therefore, it would be very interesting if the authors would test the behavioral responses to tactile stimuli in the posterior parts of the animal - does the response pattern change?

      This is indeed an interesting possibility that p-Wave function is altered upon DFz2 knock-down and hence behavioral response to posterior touch is changed. However, it is technically challenging to test this with tactile stimuli, due to the difficulty of (1) distinguishing between normal and fast-forward locomotion and (2) delivering a posterior touch stimulus while the larva is moving forward, which is the default behavior of the larvae on an agar plate.

      As highlighted above, the authors should provide additional evidence that the circuit response to a-wave neurons is changed after a DFz2 knockdown. The authors should monitor the activation wave in response to optogenetic activation of anterior wave neurons - analogous to the data provided in Figure 4 of their 2017 paper. If this response is now switched for a-wave activation but not p-wave activation it would greatly support their claims and this data would be less ambiguous compared to the behavioral locomotion data.

      As described in our response to the public review, we attempted this approach but found that the in vitro optogenetics experiment is unfortunately not feasible due to relatively weak expression of R60G09-GAL4 in the larvae. Local activation of control aWave induced fictive backward locomotion only at low frequencies, making comparison with the experimental a-Wave very difficult.  The MB120B-spGAL4 used in our 2017 study could not be employed in this study as it does not drive expression during the embryonic stages and thus cannot be used to knock down DFz2 during development. 

      (2) Related to this point. Why would the normal "backward" circuitry of a-wave neurons be functionally suppressed in Dfz2 knockdowns? Do the authors observe reduced synaptic connectivity in these segments? Vesicle clustering of synaptotagmin or other presynaptic markers could be used as a first. As the innervation pattern is only extended by approximately one segment, it is surprising that the changes are so significant.

      We agree that these are important and interesting points, which remain to be explored in the future study. As described above, we have performed Brp immunostaining and showed that the posterior ectopic axons of a-Wave do contain synapses (new Figure 2). We also found a slight decrease in the number of synapses in the anterior region, which could partially contribute to the weaker activation of downstream neurons responsible for eliciting backward locomotion. Another possibility is that backward suppression occurs through lateral interaction among downstream circuits. Since forward and backward locomotion do not occur simultaneously, it is likely that the circuits driving these two behaviors are mutually inhibitory. Upon DFz2 knock down in a-Wave, downstream neurons inducing fastforward locomotion may become more strongly activated than those inducing backward locomotion, resulting in inhibition of the latter via a “winner-take-all” mechanism. Since these discussions are highly speculative, we chose not to include them in the revised manuscript.  

      (3) The low number of neurons analyzed per segment is of slight concern. This is particularly the case for the control data set used in Figure 1 and Figure 2. As stated, the same datasets are used for both figures. However, at most 6 neurons were analyzed (and for two segments only 3). The control morphology may be more variable than indicated by this data.

      As mentioned above, we now have dissected 50 larvae each for the control and experimental groups, obtained seven and six clones respectively, and included these data in the revised manuscript. We apologize that the sample sizes are still relatively small but hope the reviewer understands the inherently low “hit rate” of the stochastic labelling method.

      It is somewhat curious that in Figure 1- Supplement 3 the authors report the same number of control clones per segment as in Figure 1/2 - is this simply a coincidence? And if this is an independent dataset why did the author use new controls here but not for Figure 2? It is clear that it is very difficult to generate this data but increasing the n-number beyond 3-6 per segment would significantly increase the confidence in the presented data.

      We apologize for the confusion. The data in Figure 1 – figure supplement 3 represent the innervation pattern of dendrites, not axons. We have corrected the figure caption accordingly. These data were obtained from the same samples used to analyze axonal innervation, as shown in the original version of Figure 1F-J.

      (2) The name of the RNAi lines should be indicated in Figure 1 and Figure Supplement 3 to facilitate reading - at least the precise names should be given in both figure legends.

      We have added these labels in the revised figure legends as requested.

      (3) In Figure 4E again the control numbers of Figure 1 for the A2-wave axon are reused. This does not seem appropriate as now a different Gal4 driver is used and a different method to induce individual neuronal clones. Both components may induce significant variability in expression or arborization. As only 3 clones for the wnt4 mutant condition are analyzed (and compared to 5 control clones), this data does not allow for strong conclusions. The authors clearly state the reuse and different methods in the legend of Figure 4 F/G but should also highlight it for the E panel.

      Here, we assume that the reviewer is referring to the former Figure 3 (now Figure 4). We have added a note in the legend that the control data, obtained using a different method, were reused in this panel.

      (4) The expression levels of DWnt4 and DFz2 were analyzed at the end of embryogenesis. At what developmental stage does the axonal extension of wave neurons take place? Is the gradient maintained throughout the first larval stages?

      Based upon the lateral view of Wave neurons in Figure 1—figure supplement 1D, we think that the axonal extension is already established by approximately 20 hr after egg laying. Previously, we performed Wnt4<sup>MI03717-Trojan-GAL4</sup> > GFP.nls immunostaining in the third instar larva and observed a similar gradient of GFP signals towards the posterior end of the ventral nerve cord (VNC). We have included this data in the revised manuscript (new Figure 5—figure supplement 1).

      (5) The authors state that either 2nd or 3rd instar larvae were used for the optogenetic experiments. This may induce unnecessary variation in their assay and should be avoided. As natural variance exists in larvae regarding forward stride duration, the comparison of "on" state forward stride duration between control and experimental genotype is potentially not the best measurement of effect size. What is the difference between OFF and ON stage within the control and experimental genotype? In both cases stride duration decreases but there may not be a significant difference between the delta of the two genotypes. Thus, the observed effect may in part be due to "slower" animals in the control pool. The authors should discuss this more carefully.

      We thank the reviewer for bringing up this critical issue. Indeed, the stride durations of larvae between the control and DFz2 knock-down are slightly different in the OFF condition, although this is not statistically significant. In addition, the effect size of Wave activation on mean stride duration is -0.14 (s) in control while -0.21 (s) in DFz2 knock-down, which we interpret as DFz2 knock-down resulting in stronger fastforward locomotion upon Wave activation. We have incorporated this note in the corresponding figure legends (new Figure 6; formerly Figure 5).

      (6) While the study clearly provides convincing evidence for their model, the authors should tune down their conclusions in the discussion a little bit and highlight that parts of their discussion are speculative.

      We have revised the discussion as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Albeit the optogenetic behavioral experiments strongly support that the altered axonal projection affect normal locomotion, simultaneous labeling of Wave neurons in DFz2 KD animals with presynaptic markers would strengthen the conclusion of ectopic connection of the extended axon with other circuits.

      Please see our response to your public review.

      Figure 1 K+L, Figure 2H, I, Figure 3 F+G: many of the individual data points are not visible in the Whisker plot- changing their color would be useful to visualize them better.

      We have changed the outline width of the box plots to make the individual data points visible.

      Figure 1-Supplement 2: In addition to the comments in the public review- a) the asterisk font size changes in the different panels, e.g. it is much smaller in G', b) font size in some graphs/legends should be increased - in particular in E the hyphenated letters in the genotypes are so small rendering them almost illegible.

      We have unified the font size to make them readable in the figure. We thank the reviewer for the suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a significant contribution that enhances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides crucial insights into the intermediate steps of CMG formation, and the particle analysis and model predictions compellingly describe the mechanism of Cdc45 loading. Building upon previously known Sld3 and Cdc45 structures, this study offers new perspectives on how Cdc45 is recruited to MCM DH through the Sld3-Sld7 complex. The most notable finding is the structural rearrangement of Sld3CBD upon Cdc45 binding, particularly the α8-helix conformation, which is essential for Cdc45 interaction and may also be relevant to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a potential mechanism for its binding to Mcm2NTD. Furthermore, Sld3's ssDNA-binding experiments provide evidence of its novel functions in the DNA replication process in yeast, expanding our understanding of its role beyond Cdc45 recruitment.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research. This research also opens up several new opportunities to utilize structural biology to unravel the molecular details of the model presented in the paper.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of detailed structural validation for the proposed Sld3-Sld7-Cdc45 model, and its CMG bound models, which could be done in the future using advanced structural biology techniques such as single particle cryo-electron microscopy. It would also be interesting to explore how Sld7 interacts with the MCM helicase, and this would help to build a detailed long-flexible model of Sld3-Sld7-Cdc45 binding to MCM DH and to show where Sld7 will lie on the structure. This will help us to understand how Sld7 functions in the complex. Also, future experiments would be needed to understand the molecular details of how Sld3 and Sld7 release from CMG is associated with ssARS1 binding.

      The proposals based on this study provide new knowledge of the CMG formation process. We agree that our Sld3-Sld7-Cdc45 model will be further confirmed by cryo-EM. We improved our ssARS1-binding assay and quantified data (See the response to Recommendations for the authors of #3 review).

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. Although the single-stranded DNA binding data from Sld3 of different species is a minor weakness, the experiments support a model in which the release of Sld3 from the complex may be promoted by its binding to origin single-stranded DNA exposed by the helicase.

      Strengths

      The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      The proposed model of Sld3 release from the complex through binding to single stranded DNA at the origin is intriguing.

      Weaknesses

      The section on the binding of Sld3 complexes to origin single-stranded DNA is somewhat weakened by the use of Sld3 proteins from different species. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      Although the study reveals that Sld3 binds to different residues of Cdc45 than those previously shown to bind Mcm or GINS, the data in the paper do not shed any additional light on how GINS and Sld3 binding to Cdc45 or Mcms. would affect each other. Other previous research has suggested that the binding of GINS and Sld3 to Mcm or Cdc45 may be mutually exclusive. The authors acknowledge that a structural investigation of Sld3, Sld7, Cdc45, and MCM during the stage of GINS recruitment will be a significant goal for future research.

      We agree that it is better to use all samples from a source; however, due to limitations in protein expression, we used Sld7-Sld3CBD-Cdc45 from a different source. The two sources used in this study belong to the same family, and the proteins Sld7, Sld3 and Cdc45 share sequence conservation with similar structures predicted by Alphafold3 (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45). Such similarity in source and proteins allows us to do the comparison. We also mentioned that a cryo-EM study of Sld3-Sld7-Cdc45-MCM and Sld3-Sld7-CMG structures will be a significant goal for future research in our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion even in the revised version.

      In this revision, we improved our ssARS1-binding assay in more quantitative ways (See the response to Recommendations for the authors).

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for all their replies to my previous questions and for doing all the necessary corrections. I am satisfied with most of their replies, however, upon second reading I have a few more suggestions which could help to improve the manuscript further and make an impact in the field. My comments are listed below.

      (1) In general, the manuscript is well structured, but I feel that it requires professional English correction. In many places it was difficult to understand the sentences and I had to read it several times to understand it. Also, very long sentences should be avoided. The flow should be easy to read and understand, and that is why I feel it requires professional English correction.

      Following the comment, we checked English carefully and shortened the very long sentences.

      (2) Page 5, line 103, please include molecule after the word complex to make it like- "Only one complex molecule exists within an asymmetric unit."

      We revised this sentence (P5/L103).

      (3) Line 113- more than the N-terminal half of the protruding long helix α7 113 was disordered in the Sld3CBD-Cdc45 complex. This sentence is not clear. What does it mean more than the N-terminal half? Please rewrite it.

      We revised this sentence to give the corresponding residue number “(D219–H231)” (P5/L114).

      (4) Page 5, result 2- Conformation changes in Sld3CBD and Cdc45 for binding each other, this section may require a little restructuring. Line 130-131- "Therefore, the helix α8CTP seems to be an intrinsically disordered segment when Sld3 alone but 130 folds into a helix coupled to the binding partner Cdc45 in the Sld3CBD-Cdc45 complex." This statement is the crux of the structural finding and therefore, I feel it should move after the first sentence.

      Thank you for your comments. We rewrote this part (P5/L128-131).

      (5) Line 121-122: Compared to the isolated form (PDBIDs: 5DGO 121 for huCdc45 [31] and 6CC2 for EhCdc45 [33]) and the CMG form (PDBID: 3JC6. Write it in the same format. Make 6CC2 in bracket like other PDB IDs. Restructure this sentence.

      We revised this sentence (P5/122-123).

      (6) Line 127-129: This sentence is also not very clear.

      We revised this sentence together with above No (4). (P5/L128-131)

      (7) In my question 4- "Can authors add a supplementary figure showing the probability of disordernes..."., I meant to use a disorder prediction tool like IUPred for the protein sequences and show that α8 is predicted to be a disordered upon sequence analysis. This will help to show the inherent property of α8 helix, and it could add up to the understanding that a disordered region is being structured in the complex structure.

      The structures showed that α8CTP is stabilized by binding with Cdc45, but disordered in Sld3CBD alone, indicating that this part is flexible, like an intrinsically disordered segment. We have deposited the structure to PDB, so predictions like IUPred cannot show meaningful information.

      (8) Question 9 regarding Supplementary Figure 8- Please include your statement in the figure legend - "WT Sld3CBD was prepared in a complex with Cdc45, while the mutants of Sld3CBD existed alone, we calculated the elements of secondary structure from the crystal structure of Sld3CBD-Cdc45. The concentration of samples was controlled to the same level for CD measurement."

      Following the comment, we optimized the figure legend of Supplementary Figure 8.

      (9) Question 13- I understand that negative staining and SEC-SAXS experiments could be very tricky for such protein complexes, which have very long loops and are flexible. Did authors try a GraFix cross-linking before doing the negative staining TEM? If it is not being tried, then it might be a good idea to try it and it may help to get much cleaner particles and easier class averaging. Although I completely understand the technical challenges the authors describe and I agree with them, I still feel that one good experiment that shows this dimer model would be very helpful to strengthen the claim. I am concerned because if people start using a similar DLS experiment to calculate intermolecular distances, citing your paper, in many cases it might be a wrong interpretation. In case the negative staining still does not work, at least discuss your technical challenges in the discussion section and mention that SEC-SAXS showed a similar length of the complex and show the Guinier plot and Porod plots in the supplementary data.

      We believe that DLS is one of the methods for analyzing the single particle size. Of course, the confirmation by multiple methods will give compelling evidence. Following the comment, we added SEC-SAXS data in the [Results] (P7/L194-196) (Cdc45 recruitment to MCM DH by Sld3 with partner Sld7) and Supplementary Figure 11. The Sld7-Sld3-Cdc45 forms a flexible, long shape. Each binding domain is rigid but linked by the long loops. The flexibility problems are caused by the long loop linkers, but not by binding. So, we did not try to use the cross-linking method for analysis experiments.  

      (10) Page 8, line 221- litter sequence specificity: Correct the word "litter" with little. Also, the word shaped is written as sharped at a few places in the manuscript. Please correct it.

      We apologize for making such mistakes. We have modified these words.

      (11) Page 9, line 237-238: Would it be possible to add a lane showing Sld7 binding to the ssDNA in figure 4. I recommend showing this to understand the ssDNA binding affinity of Sld7 by itself and it will also help us to compare when it is in complex with Sld3.

      Considering that Sld7 on CMG is always a complex with Sld3, the ssDNA binding affinity should use the Sld3-Sld7 complex. Additionally, we attempted to overexpress Sld7, but could not obtain the target protein.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the improved manuscript. The following sentence is unclear: "Cdc45 binds tighter to long ssDNA (>60 bases) with a litter sequence specificity".

      We apologize for making such a mistake. We modified “litter” to “little”.

      I found it challenging to understand which species were used while reading the results section and figure legends. I recommend that the authors revise the text in both the results and figure legends to clearly indicate when proteins from different species are being compared. Additionally, it would be valuable to explicitly acknowledge this limitation in the text.

      Following the comment, we added a description for using different species in results (P8/L224-225) and figure legends (Supplementary Figure 14). We added more information in the Methods to explain why we used two species for preparing proteins.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) The current title is not appropriate for the general readers. At least, DNA replication or DNA replication initiation should be added and abbreviations such as CBD should be avoided.

      Following the comment, we added “DNA replication” into the title. Regarding “CBD”, since the full name of “Cdc45 binding domain” is too long, we continue to use Sld3CBD.

      (2) As in my previous review, I asked for quantification of the EMSA assay shown in Figure 4 and Supplemental Figures 13 and 14. Since some signals of the bands are very weak, it is hard to conclude something. Given different protein concentrations used in the experiment, the authors should provide any kinds of value. For example, Sld3CBD-CDC45 shows weaker DNA binding than Sld3CBD alone (line 231). Is this true (or reproducible)? It is hard to conclude without any quantification.

      We have repeated the EMSA assay four or more times with different rods of overexpression, purification and DNA synthesis, indicating that the EMSA assay is reproducible. In this revision, we changed the DNA stain and adjusted the ratio between the protein and ssDNA with increasing concentrations. The smeared bands of ssDNA with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC exhibit enhanced discernibility, and the ssDNA bands are intense enough for grayscale calculations (Figure 4 in the second revised version). We used a series of t-tests to confirm a significantly ssDNA residual level between Sld3CBD–Cdc45 to Sld3CBD, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔCS (t-test, ****: P<0.0001). We also carefully controlled the sample amount in the EMAS assay and described it in the [Methods].

      Moreover, in this EMSA assay (in Figure 4), the authors suggest that the disappearance of ssDNA bands corresponds with the binding of the protein to the DNA. However, it is also possible that the DNA is degraded. It is very important to show the band of protein-DNA complexes on the gel (a whole gel, not the parts of the gel shown in Figure). Why did the authors use this "insensitive" assay using SyberGreen, not radio-labelled ssDNA?

      In this revision, we added a negative control of no ssDNA-binding by using ssARS1-3_3 for all protein samples (Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45 and Sld7–Sld3ΔC), which were the same rod of expression and purification for bound to ssARS1s (ssARS1-2 and ssARS1-5) (Figure 4), showing that the disappearance of ssDNA bands is caused by binding to proteins, not degradation. Moreover, this time, by changing the DNA stain and increasing the concentration of the samples, the smeared ssDNA bands exhibit enhanced discernibility in the high molecular weight regions when mixed with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, whereas no bands appeared in the NC (ssARS1-3_1). The positions of smeared ssDNA bonds correspond to those of protein in the protein-stain pages, indicating that ssARS1 were complexed with proteins. Following the comment, we show all bands on the gel in Figure 4 and Supplementary Figure 14. Compared to Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, Sld3CBD and ssDNA bonds could not be observed because the pI value of Sld3CBD, which affects the entry of the samples into the gel.

      We agree that using radio-labelled ssDNA can obtain a sensitive binding assay. However, current laboratory constraints did not allow us to use radio-labelled ssDNA. Furthermore, considering the characteristics of our target proteins, Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔC, we planned to perform the binding assay in a more natural state without any modifications, labelling or linkers. Additionally, we have attempted to use ITC experiments but failed in the measurements. Presumably, the conformational flexibility of Sld7-Sld3-Cdc45 and Sld7-Sld3 caused a thermodynamic anomaly.

      Minor points:

      (1) Line 215, 80b: This should be "80 nucleotides(nt)". Throughout the text, nucleotides is better than base to show the length of ssDNAs.

      Thank you for your comments. We modified these words throughout the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an exploratory study that doesn't explore quite enough. Critically, the authors make a point of mentioning that neuronal firing properties vary across cell types, but only use baseline firing rate as a proxy metric for cell type. This leaves several important explorations on the table, not limited to the following:”

      1a: “Do waveform shape features, which can also be informative of cell type, predict the effect of stimulation?”

      To address this question, we modeled our approach to cell type classification after Peyrache et al. 2012. More specifically, we extracted two features from the mean unit waveforms—the valley-to-peak time (VP) and the peak half-width (PHW). These features were then used to classify units into two distinct clusters (k-means, clusters = 2, based on a strong prior from existing literature), representing putative excitatory and inhibitory neurons. Our approach recapitulated many of the same observations in Peyrache et al. 2012, namely (1) identification of two clusters (low PHW/VP: inhibitory, high PHW/VP: excitatory), (2) an ~80/20 ratio of excitatory/inhibitory neurons, and (3) greater baseline firing rates in the inhibitory vs. excitatory neurons. However, we did not observe a preferential modulation of one cell type compared to another (see newly created Figure 4). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Change to Text:

      Created Figure 4 (Separation of presumed excitatory and inhibitory neurons by waveform morphology).

      Caption: (A) Two metrics were calculated using the averaged waveforms for each detected unit: the valley-to-peak width (VP) and peak half-width (PHW). (B) Scatterplot of the relationship between VP and PHW; note that units with identical metrics are overlaid. Using k-means clustering, we identified two distinct response clusters, representing presumed excitatory (E, blue) and inhibitory (I, red) neurons. The units from which the example waveforms were taken are outlined in black. Probability distributions for each metric are shown along the axes. (C) Total number of units within each cluster, separated by region. (D) Comparison of baseline firing rates, separated by cluster. (E) Percent of modulated units in each cluster. * p < 0.05, NS = not significant.

      Added a description of clustering methodology to lines 132-137: “We calculated two metrics from the averaged waveform from each detected unit: the valley-to-peak-width (VP) and the peak half-width (PHW) (Figure 4A); previously, these two properties of waveform morphology have been used to discriminate pyramidal cells (excitatory) from interneurons (inhibitory) in human intracranial recordings (Peyrache et al., 2012). Next, we performed k-means clustering (n = 2 clusters) on the waveform metrics, in line with previous approaches to cell type classification.

      Added a section in the Results titled “Theta Burst Stimulation Modulates Excitatory and Inhibitory Neurons Equally”. Lines 370-378: “Using k-means clustering, we grouped neurons into two distinct clusters based on waveform morphology, representing neurons that were presumed to be excitatory (E) and inhibitory (I) (Figure 4B). Inhibitory (fast-spiking) neurons exhibited shorter waveform VP and PHW, compared with excitatory (regular-spiking) neurons (I cluster centroid: VP = 0.50ms, PHW = 0.51ms; E cluster centroid: VP = 0.32ms, PHW = 0.31ms), and greater baseline firing rates (U(N<sub>I</sub> = 23, N<<sub>E</sub> = 133) = 1074.50, p = 0.023) (Figure 4D). Although we observed a much greater proportion of excitatory vs. inhibitory neurons (E: 85.3%, I: 14.7%), stimulation appeared to affect excitatory and inhibitory neurons equally, suggesting that one cell type is not preferentially activated over another (Figure 4E).

      Modified discussion of the effects of stimulation on different cell types. Lines 475-483: “…To test these hypotheses directly, we clustered neurons into presumed excitatory and inhibitory neurons based on waveform morphology. In doing so, we observed ~85% excitatory and ~15% inhibitory neurons, which is very similar what has been reported previously in human intracranial recordings (Cowan et al. 2024, Peyrache et al., 2012). Interestingly, stimulation appeared to modulate approximately the same proportion of neurons for each cell type (~30%), despite the differently-sized groups. Recent reports, however, have suggested that the extent to which electrical fields entrain neuronal spiking, particularly with respect to phase-locking, may be specific to distinct classes of cells (Lee et al., 2024).”

      1b:  “Is the autocorrelation of spike timing, which can be informative about temporal dynamics, altered by stimulation? This is especially interesting if theta-burst stimulation either entrains theta-rhythmic spiking or is more modulatory of endogenously theta-modulated units.”

      The reviewer is correct in suggesting that rate-modulation represents only one of many possible ways by which exogenous theta burst stimulation may influence neuronal activity. Indeed, intracranial theta burst stimulation has previously been shown to evoke theta-frequency oscillatory responses in local field potentials (Solomon et al. 2021), and other forms of stimulation (i.e., transcranial alternating current stimulation) may modulate the rhythm, rather than the rate, of neuronal spiking (Krause et al. 2019).

      To investigate whether stimulation altered rhythmicity in neuronal firing, we contrasted the spike timing autocorrelograms, as suggested. More specifically, we computed the pairwise differences in spike timing for each trial, separating spikes into the same pre-, during-, and post-stimulation epochs described in the manuscript (bin size = 5 ms, max lag = 250 ms), grouped neurons by whether they were modulated, and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs. Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates. Subsequent statistical testing of the peak latency differences between pre-/during- and pre-/post-stimulation did not reveal any group-level differences (Mann-Whitney U tests, p > 0.05). Thus, we were not able to identify neuronal responses suggestive of altered rhythmicity (see Figure S5). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Of note, there are two elements of the data that constrain our ability to detect modulation in the rhythm of firing. First, the baseline activity recorded across neurons modulated by stimulation was relatively low (i.e., median firing rate = 1.77 Hz). Second, stimulation often resulted in a suppression, rather than an enhancement, of firing rate. Taken together, the sparse firing afforded limited opportunity to characterize changes to subtle patterns of spiking. 

      Change to Text:

      Created Figure S5 (Analysis of modulation in spiking rhythmicity)

      Caption: (A) Representative autocorrelograms ACG) for a single neuron. The pairwise differences in spike timing were computed for each trial and epoch (bin size = 5 ms, max lag = 250 ms), then smoothed with a Gaussian kernel. The peak in the normalized ACG across trials was computed for each epoch. (B) Kernel density estimate of the peak ACG lag, separated by epoch. (C) The peak ACG lags were split by whether the neuron was modulated (Mod) or unaffected by stimulation (NS = not significant) for each of the two contrasts: pre- vs. during-stim (left) and pre- vs. post-stim (right).

      Details about the autocorrelation methodology have been incorporated. Lines 166-172: “To investigate whether stimulation altered rhythmicity in neuronal firing, we analyzed the spike timing autocorrelograms. More specifically, we computed the pairwise differences in spike timing for each trial (bin size = 5 ms, max lag = 250 ms) and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs (pre-, during-, post-stimulation). Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates.

      The results from contrasting the autocorrelograms are now mentioned briefly. Lines 297-298: “Stimulation, however, did not appear to alter the rhythmicity in neuronal firing, as measured by spiking autocorrelograms (Figure S5).”

      1c: “The authors reference the relevance of spike-field synchrony (30-55 Hz) in animal work, but ignore it here. Does spike-field synchrony (comparing the image presentation to post-stimulation) change in this frequency range? This does not seem beyond the scope of investigation here.”

      We agree that a further characterization of spike-field and spike-phase relationships may provide rich insights into more complex regional and interregional dynamics that may be altered by stimulation. Given that many metrics are biased by sample size (e.g., number of spikes), which can vary considerably, computing the pairwise phase consistency (PPC) between spikes and LFP is a preferred metric (Vinck et al. 2010). Although PPC is unbiased, its variance nonetheless increases considerably with low spike counts; pooling spike counts across trials, however, decouples the temporal relationship between spiking and the LFP phase for each trial, confounding results and yielding an unstable estimate.

      To determine whether such an analysis is indeed possible, we calculated the percentage of stimulation trials with ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (a relatively low threshold for inclusion). Only a very small proportion of the total number of trials across all neurons met this criterion (2.5%). Thus, because of the sparse spiking in our data, we are unable to reliably characterize spike-field or spike-phase modulation in detected neurons.

      Change to Text:

      In the manuscript, we have added a description of why our data is not well-suited to investigate these relationships.

      Lines 532-538: “The present study did not investigate interactions between spiking activity and local field potentials because neuronal spiking was sparse at baseline and often further suppressed by stimulation; only a very small proportion of the total number of trials across all neurons exhibited ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (~2.5%). Although certain metrics are not biased by sample size (e.g., pairwise phase consistency), low spike counts can dramatically affect variance and, therefore, result in unstable estimates (Vinck et al., 2011).

      1d: “How does multi-unit activity respond to stimulation? At this somewhat low count of neurons (total n=156 included) it would be valuable to provide input on multi-unit responses to stimulation as well.”

      We thank the reviewer for this suggestion. We have incorporated an analysis of multiunit activity (MUA), which similarly identifies robust modulation via permutation-based statistical testing and characterizes the different profiles of responses (i.e., increased vs. decreased MUA threshold crossings pre- vs. post-stimulation).

      Change to Text:

      Created Figure S8 (Analysis of multiunit activity response to stimulation)

      Caption: (A) Example trace of multiunit activity (MUA) in one channel during a single stimulation trial. Threshold crossings are highlighted with a pink dot overlaid on the MUA signal with a corresponding hash below. (B) The percentage of channels with significantly modulated MUA, separated by the direction of effect. (C) The percentage of channels with significantly modulated MUA, separated by direction effect and region. Inc (red; post > pre) vs. Dec (blue; post < pre). HIP = hippocampus, OFC = orbitofrontal cortex, AMY = amygdala, ACC = anterior cingulate cortex. *** p < 0.001, NS = not significant.

      Details about the MUA methodology have been incorporated. Lines 174-180: “Finally, we measured modulation in multiunit activity (MUA) by filtering the microleectrode signals in a 300-3,000 Hz window and counting the number of threshold crossings. Thresholds were determined on a per-channel basis and defined as -3.5 times the root mean square of the signal during the baseline period; activity during stimulation was excluded since stimulation artifact is difficult to separate from MUA in the absence of spike sorting.

      MUA results are now incorporated. Lines 365-367: “Additional characterization of MUA revealed a dominant signature of increased activity post- vs. pre-stimulation, in line with these trends observed at the single-neuron level (Figure S8).”

      1e: “Several intracranial studies have implicated proximity to white matter in determining the effects of stimulation on LFPs; do the authors see an effect of white matter proximity here?”

      We thank the reviewer for the interesting question. Subsequent characterization revealed only small differences in the proximity of stimulation contacts to white matter (range 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9). Critically, this is not to suggest that white matter proximity has no interaction with the reported behavioral effects, but rather, that we could not identify such an association within our data.

      Change to Text:

      Created Figure S9 (The effect of stimulation proximity to white matter and distance to recorded neurons).

      Caption: (A) Kernel density estimate of the Euclidean distance from stimulation contacts to nearest WM structure (in mm); hash marks represent individual observations. (B) The change in memory performance (Δd’) was linearly regressed onto the distance from the stimulated contacts to white matter.

      The following has been added to lines 405-426: “Proximity to white matter has been shown to influence the effects of stimulation on behavior and the strength of evoked responses (Mankin et al., 2021; Mohan et al., 2020; Paulk et al., 2022). Across all stimulated contacts, we observed only small differences in the proximity of stimulation contacts to white matter (median = 4.5 mm, range = 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9).

      Comment 2: “It is a little confusing to interpret stimulation-induced modulation of neuronal spiking in the absence of stimulation-induced change in behavior. How do the authors findings tell us anything about the neural mechanisms of stimulation-modulated memory if memory isn't altered? In line with point #1, I would suggest a deeper dive into behavior (e.g. reaction time? Or focus on individual sessions that do change in Figure 4A?) to make a stronger statement connecting the neural results to behavioral relevance.”

      We agree that the connection between the observed stimulation-induced neuronal modulation and effects on behavior is unclear and has proven challenging to elucidate. Per the reviewer’s suggestion, we further focused our analyses on the neuronal modulation effects in the individual sessions that resulted in a robust change in memory performance (stimulation vs. no-stimulation d’ difference threshold of ± 0.5, based on a moderate effect size for Cohen’s d); both a positive and negative threshold were used to capture robust changes in memory performance associated with firing rate modulation, whether enhancement or suppression. To this end, we contrasted the proportion of modulated neurons in the sessions where stimulation resulted in a robust behavioral change (Δd’) with those that did not (~d’). We did not observe a difference in the proportions between groups when collapsed across all sampled regions, or when separately evaluated (Fisher’s exact tests, p > 0.05; see Figure 5C).

      Given that this approach did not further clarify the connection between our neural and behavioral results, we believe it is most appropriate to deemphasize claims in the manuscript regarding the potential insights for behavioral modulation (e.g., memory enhancement), and have done so.

      Change to Text:

      Toned down reference to the memory-related effects of stimulation in the abstract by removing the following lines from the abstract: “Previously, we demonstrated that intracranial theta burst stimulation (TBS) of the basolateral amygdala (BLA) can enhance declarative memory, likely by modulating hippocampal-dependent memory consolidation…” and “…and motivate future neuromodulatory therapies that aim to recapitulate specific patterns of activity implicated in cognition and memory.”

      Changed Figure 4 to Figure 5

      Created Figure 5C (Interaction between behavioral effects and neuronal modulation)(C)  Change in recognition memory performance was split into two categories using a d’ difference threshold of ± 0.5: responder (positive or negative; Δd’, pink) and non-responder (~d’, grey). Individual d’ scores are shown (left) with points colored by outcome category; dotted lines demarcate category boundaries, and the grey-shaded region represents negligible change. The number of sessions within each outcome category (middle) and the proportion of modulated units as a function of outcome category, separated by region (right). NS = not significant.

      The description of the behavioral results has been updated. Lines 394-403: “At the level of individual sessions, we observed enhanced memory (Δd’ > +0.5) in 36.7%, impaired memory (Δd’ < -0.5) in 20.0%, and negligible change (-0.5 ≤ Δd’ ≤ 0.5) in 43.3% when comparing performance between the stim and no-stim conditions; a threshold of Δd’ ± 0.5 was chosen for this classification based on the defined range of a “medium effect” for Cohen’s d. To test our hypothesis that neuronal modulation would be associated with changes in memory performance, we combined the sessions that resulted in either memory enhancement or impairment and contrasted the proportion of modulated units across regions sampled. We did not, however, observe a meaningful difference in the proportion of modulated units when grouped by behavioral outcome (all contrasts p > 0.05) (Figure 5C).

      Lines 213-214 and 394-397 have been edited to reflect a change in the d’ threshold used for categorizing behavioral results (from Δd’ ± 0.2 to Δd’ ± 0.5).

      Comment 3: “It is not clear to me why the assessment of firing rates after image onset and after stim offset is limited to one second - this choice should be more theoretically justified, particularly for regions that spike as sparsely as these.”

      We thank the reviewer for this question and acknowledge that no clear justification was provided for this decision in the manuscript. Our decision to limit each of the analysis epochs to 1s was chosen for two reasons. First, the maximum possible length of the during-stimulation epoch was 1 s (stim on for 1 s). Although the pre- and post-stimulation epochs could be extended without issue, we were concerned that variable time windows could introduce a bias, for instance, resulting in different variances between epochs. Second, we anticipated, both from empirical observations and prior literature, that the neural response following stimulation or task features (e.g., image onset/offset) was likely to be transient, rather than sustained for a period of many seconds. By keeping the windows short, we ensured that our approach to detecting modulation (i.e., contrasting trial-wise spike counts between each pair of epochs) captured the intended effect rather than random noise. We have incorporated a discussion of this rationale in the Peri-Stimulation Modulation Analyses section.

      Change to Text:

      Lines 156-158 have been added: “Each epoch was constrained to 1 s to ensure that subsequent firing rate contrasts were unbiased and to capture potential transient effects (e.g., image onset/offset).”

      Comment 4: “This work coincides with another example of human intracranial stimulation investigating the effect on firing rates (doi: https://doi.org/10.1101/2024.11.28.625915). Given how incredibly rare this type of work is, I think the authors should discuss how their work converges with this work (or doesn't).”

      Thank you for bringing this highly relevant work to our attention. We were unaware of this recent preprint and have incorporated a discussion of its main findings into the manuscript.

      Change to Text:

      New citations: van der Plas et al. 2024 (bioRxiv), Cowan et al. 2024 (bioRxiv)

      The discussion of related studies has been updated. Lines 447-457: “Few studies, however, have characterized the impact of electrical stimulation via macroelectrodes on the spiking activity of human cortical neurons, none of which involve intracranial theta burst stimulation. One study reported a long-lasting reduction in neural excitability among parietal neurons, with variable onset time and recovery following continuous transcranial TBS in non-human primates (Romero et al., 2022). In a similar vein, it was recently shown that human neurons are largely suppressed by single-pulse electrical stimulation (Cowan et al., 2024; Plas et al., 2024). Other emerging evidence suggests that transcranial direct current stimulation may entrain the rhythm rather than rate of neuronal spiking (Krause et al., 2019) and that stimulation-evoked modulation of spiking may meaningfully impact behavioral performance on cognitive tasks (Fehring et al., 2024).”

      Comment 5: “What information does the pseudo-population analysis add? It's not totally clear to me.”

      We recognize the need to further contextualize the motivation for the exploratory pseudo-population analysis and appreciate the reviewer for bringing the lack of detail to our attention. In brief, the analysis allowed us to observe trends in activity across populations of neurons, which, in principle, are not visible by characterizing modulation solely in discrete neurons. Additional details have been incorporated into the manuscript, as suggested.

      Change to Text:

      Additional justification has been incorporated in the description of the methodology. Lines 185-187: “…This approach enables the identification of dominant patterns of coordinated neural activity that may not be apparent when examining individual neurons in isolation.”, lines 192-194: “…By collapsing across subjects into a common pseudo-population, this analysis provides a mesoscale view of how stimulation modulates shared activity patterns across anatomically distributed neural populations.”

      A summary interpretation has been added to the paragraph describing the results. Lines 326-328: “Taken together, these analyses reveal global structure in the state space of responses to BLA stimulation within hippocampal circuits.”

      Reviewer #2 (Public review):

      Comment 1 “Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline of firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates.”

      This is an important point that was not previously addressed in our manuscript. We suspect there are likely two factors at play worth considering with respect to our chosen nonparametric modulation index: neurons with lower activity require smaller changes in spike counts to be significantly modulated (easier to flip ranks), and neurons with higher activity empirically exhibit greater absolute shifts in the number of spikes. Our further use of permutation testing, while mitigating false positives, may also somewhat constrain the ability to detect modulation in sparsely active neurons. Nonetheless, given that many trials entailed few or no spikes, we believe this approach is preferable to alternatives that may be more susceptible to noise (e.g., percent change in trial-averaged firing rate from baseline).

      To better understand the tradeoffs with detection probability, we performed a sensitivity analysis. We generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz) and simulated the likelihood of detection with our given modulation index across neurons. The results of the simulation support the notion that the probability of detecting modulation is lower for sparsely active neurons (Figure S8C). Further discussion of this consideration for the chosen modulation index, as well as details regarding the sensitivity analysis, have been incorporated into the manuscript.

      Change to Text:

      Created Figure S7C (Detection probability analysis)

      Caption: The same permutation-based analyses reported in the manuscript were repeated under different control conditions… (C) Visualization of the predicted probability of detecting modulation across synthetic neurons with variable firing rates and modulation effect sizes; FR = firing rate.

      Lines 223-224 have been added to the Methods section titled “Firing Rate Control Analyses”: “We performed a series of control analyses to test whether our approach to firing rate detection was robust…”

      A description of the simulation has been incorporated into the same section as above. Lines 234-237: “Finally, to better understand the tradeoffs with our statistical approach, we generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz), then simulated the likelihood of detecting modulation across variable conditions (Figure S7C).”

      The description of the results from the control analyses has been updated. Lines 330-339: “Finally, we performed three supplementary analyses to evaluate the robustness of our approach to detecting firing rate modulation: a sensitivity analysis assessing the proportion of modulated units at different firing rate thresholds for inclusion/exclusion, a data dropout analysis designed to control for the possibility that non-physiological stimulation artifacts may preclude the detection of temporally adjacent spiking, and a synthetic detection probability analysis. These results recapitulate our observation that units with higher baseline firing are most likely to exhibit modulation (though the probability of detecting modulation is lower for sparsely active neurons) and suggest that suppression in firing rate is not solely attributable to amplifier saturation following stimulation (Figure S7).

      Comment 2: “Readers can benefit from understanding with more details the locations chosen for stimulation - in light of previous studies that found differences between effects based on proximity to white matter (For example - PMID 32446925, Mohan et al, Brain Stimul. 2020 and PMID 33279717 Mankin et al Brain Stimul. 2021).”

      This has been addressed in the above response to Reviewer’s 1 comment 1.1e.

      Change to Text:

      See changes related to Reviewer 1 comment 1.1e.

      Comment 3: “Missing information in the manuscript…”

      3a: “Images of stimulation anatomical locations for all subjects included in this study. Ideally information about the impedance of the contacts to be able to calculate the actual current used.”

      As requested, we have provided an image from the coronal T1 MRI sequence, which highlights the position of the stimulated contacts for each of the 16 patients. Though we did not measure the impedances directly, the stimulation was current-controlled, which ensured that the desired current and charge density were consistent regardless of the tissue or electrode impedance.

      Change to Text:

      Created Figure S1 (Anatomical location of stimulated electrodes).

      Caption: A coronal slice from the T1-weighted MRI scan is shown for each patient who participated in the study (n = 16). Electrode contacts within the same plane of the image are shown with blue circles, and the bipolar pair of stimulated contacts within the basolateral amygdala is highlighted in red.

      Lines 144-145 have been edited to reflect that the delivered stimulation was current-controlled: “Specifically, we administered current-controlled, charge-balanced, …”

      3b: “The studied population is epilepsy patients, and the manuscript lacks description of their condition, proximity to electrodes included in the study to pathological areas, and the number of units from each patient/hemisphere.”

      We agree that additional information regarding patient demographics, experimental details, and clinical characteristics would further contextualize this unique patient population. A new table has been included, which contains the following information: patient ID, sex, age, # experimental session, # SEEG leads (and # microelectrodes), # detected units (L vs. R hemisphere), and suspected seizure onset zone.

      Change to Text:

      Created Table S1 (Patient demographics and clinical characteristics).

      Lines 258-259 have been added: “…(see Table S1 for patient demographics).”

      3c: “I haven't seen any comments on code availability (calculating modulation indices and statistics) and data sharing.”

      For clarification, a section titled Resource Availability is already appended to the end of the manuscript following the Conclusion, which describes the data and code availability.

      Change to Text:

      None

      3d: “Small comment - Figure legend 3E - Define gray markers (non-modulated units?)”

      Thank you for highlighting this omission. We have updated the relevant figure caption.

      Change to Text:

      The following has been added to the Figure 3 caption: “…whereas units without a significant change in activity are shown in grey.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive structure-guided secretome analysis of gall-forming microbes, providing valuable insights into effector diversity and evolution. The authors have employed AlphaFold2 to predict the 3D structures of the secretome from selected pathogens and conducted a thorough comparative analysis to elucidate commonalities and unique features of effectors among these phytopathogens.

      Strengths:

      The discovery of conserved motifs such as 'CCG' and 'RAYH' and their central role in maintaining the overall fold is an insightful finding. Additionally, the discovery of a nucleoside hydrolase-like fold conserved among various gall-forming microbes is interesting.

      Weaknesses:

      Important conclusions are not verified by experiments.

      Thank you very much. There are many aspects of this study that could be further validated, each potentially requiring years of work. Therefore, we chose to focus on two specific hypotheses: are AlphaFol-Multimer predictions accurate? Can ANK target more than one host protein? Particularly, we focused on the identification of putative targets for one of the ankyrin repeat proteins, PBTT_00818 (Fig. 6). Using one-by-one yeast two-hybrid (Y2H) assays, we tested the AlphaFold-Multimer prediction of an interaction between PBTT_00818 and MPK3. The interaction did not occur in yeast, suggesting it might not take place under those conditions.

      This negative result led us to perform a Y2H screen using an Arabidopsis cDNA library, which identified a GroES-like protein, highly expressed in roots, as a potential target of the ANK effector. Surprisingly, both the PBTT_00818–MPK3 and PBTT_00818–GroES-like protein interactions were later confirmed in planta using BiFC assays. These findings suggest two key points: (1) AlphaFold predictions can be accurate for ANK proteins, and (2) ANK domains, known for mediating protein-protein interactions, may enable these effectors to target multiple host proteins.

      Although the precise biological implications remain unclear, it is possible that ANK proteins act as scaffolds or adaptors for other effectors during infection. The validations presented here open exciting avenues for further research into the role of ANK proteins in Plasmodiophorid pathogenesis and gall formation. This is presented in the corrected preprint and Fig. 7, Table S12, Fig. S7-S8.

      Reviewer #2 (Public review):

      Summary:

      Soham Mukhopadhyay et al. investigated the protein folding of the secretome from gall-forming microbes using the AI-based structure modeling tool AlphaFold2. Their study analyzed six gall-forming species, including two Plasmodiophorid species and four others spanning different kingdoms, along with one non-gall-forming Plasmodiophorid species, Polymyxa betae. The authors found no effector fold specifically conserved among gall-forming pathogens, leading to the conclusion that their virulence strategies are likely achieved through diverse mechanisms. However, they identified an expansion of the Ankyrin repeat family in two gall-forming Plasmodiophorid species, with a less pronounced presence in the non-gall-forming Polymyxa betae. Additionally, the study revealed that known effectors such as CCG and AvrSen1 belong to sequence-unrelated but structurally similar (SUSS) effector clusters.

      Strengths:

      (1) The bioinformatics analyses presented in this study are robust, and the AlphaFold2-derived resources deposited in Zenodo provide valuable resources for researchers studying plant-microbe interactions. The manuscript is also logically organized and easy to follow.

      (2) The inclusion of the non-gall-forming Polymyxa betae strengthens the conclusion that no effector fold is specifically conserved in gall-forming pathogens and highlights the specific expansion of the Ankyrin repeat family in gall-forming Plasmodiophorids.

      (3) Figure 4a and 4b effectively illustrate the SUSS effector clusters, providing a clear visual representation of this finding.

      (4) Figure 1 is a well-designed, comprehensive summary of the number and functional annotations of putative secretomes in gall-forming pathogens. Notably, it reveals that more than half of the analyzed effectors lack known protein domains in some pathogens, yet some were annotated based on their predicted structures, despite the absence of domain annotations.

      Weaknesses:

      (1) The effector families discussed in this paper remain hypothetical in terms of their functional roles, which is understandable given the challenges of demonstrating their functions experimentally. However, this highlights the need for experimental validation as a next step.

      Thank you. Yes, there is a lot of work to do in the coming years.

      (2) Some analyses, such as those in Figure 4e, emphasize motifs derived from sequence alignments of SUSS effector clusters. Since these effectors are sequence-unrelated, sequence alignments might be unreliable. It would be more rigorous to perform structure-based alignments in addition to sequence-based ones for motif confirmation. For instance, methods described in Figure 3E of de Guillen et al. (2015, https://doi.org/10.1371/journal.ppat.1005228) or tools like Foldseek could be useful for aligning structures of multiple sequences.

      In Fig. 4e, we highlight the conserved cysteine residues. While there is no clearly conserved overall motif, the figure illustrates that despite the high sequence divergence, the key cysteines involved in disulfide bridge formation are consistently conserved across the sequences.

      (3) When presenting AlphaFold-generated structures, it is essential to include confidence scores such as pLDDT and PAE. For example, in Figure 1D of Derbyshire and Raffaele (2023, https://doi.org/10.1038/s41467-023-40949-9), the structural representations were colored red due to their high pLDDT scores, emphasizing their reliability.

      Thank you for the observation. Due to the restrictive parameters used in our analysis, over 90% of the structure would appear red. For this reason, we chose not to include the color scale, as it would not provide additional informative value in this context.

      Reviewer #1 (Recommendations for the authors):

      Experimental validation of the significance of 'CCG' and 'RAYH' motifs would further strengthen this study.

      Regarding the Mig1-like protein in Ustilago maydis, the presence of four conserved cysteine residues that are pivotal for maintaining the stability of its folded structure raises an intriguing question. Specifically, while many Mig cluster effectors contain four cysteine residues that form two conserved disulfide bridges, this structure is notably absent in the Mig protein itself. The author has speculated that these four cysteine residues form two conserved disulfide bonds, which are crucial for the stability of Mig protein folding. However, this hypothesis remains unvalidated. To test this prediction, it would be prudent to simulate mutations in the cysteine residues corresponding to the disulfide bonds in Mig and employ molecular dynamics simulations to assess the stability of folding before and after the mutation.

      Mig-1 does contain the four conserved cysteine residues responsible for forming disulfide bridges. However, due to the high divergence among Mig-1-like sequences, the alignment software was unable to properly align all the cysteine residues. As a result, Mig-1 may appear to lack these conserved cysteines in the alignment, although they are indeed present upon individual inspection. This is an area that research groups working with U. maidis as a model could explore further to expand our understanding of this effector family.

      Could you please clarify why talking about Ankyrins and LRR in Arabidopsis thaliana (line 252)? Additionally, what are the structural and functional differences between the LRR sequences of P. brassicae and those of the host plants?

      This sentence refers to the identification of the ANK motif in P. brassicae and S. spongospora, not in Arabidopsis thaliana. While the hydrophobic core of the ANK domains appears conserved between the host and the pathogen, the surface residues are highly polymorphic.

      The evidence supporting the interaction between the ANK effector and Arabidopsis immunity-related proteins, as validated using AlphaFold-Multimer, is currently limited. To enhance the reliability of these data, it is advisable for the author to select several pairs of proteins predicted to interact for further experimental verification.

      We conducted a large-scale yeast two-hybrid (Y2H) screen using the ANK domain effector PBTT_00818, which was selected due to its high iPTM+pTM score. The Y2H interactions were subsequently validated through BiFC assays. Our results show that PBTT_00818 interacts with Arabidopsis MPK3 in the nucleus, consistent with predictions from the AlphaFold2-multimer model. In addition, PBTT_00818 was also found to target AT3G56460, a GroES-like zinc-binding alcohol dehydrogenase, also localized in the nucleus.

      While the manuscript is well-composed, certain sections could be enhanced for clarity and readability. For example, the discussion section could be expanded to include a more in-depth analysis of the implications of the findings for understanding the virulence mechanisms of gall-forming microbes. Additionally, a comparison of the findings with previous studies on related pathogens would provide a more comprehensive perspective.

      Certain sections of the discussion have been expanded. However, we chose to focus on the novel aspects of the study and to avoid comparisons with other plant pathogens, as those mechanisms are already well known and extensively studied. Studies using AlphaFold in plant pathology are also limited.

      *Reviewer #2 (Recommendations for the authors):*

      The results of clustering analyses are highly dependent on the chosen thresholds. Given that the authors provide clear and well-designed visualizations of SUSS effectors in Figures 4a and 4b, applying the same presentation methods to Figures 5a and 5b could make these analyses more convincing.

      We were able to generate the all-vs-all matrix for Figures 4a and 4b because it involved only 13 proteins. However, Figure 5b includes over 40 effectors, making it impractical to visualize the data in the same way. Instead, we presented the sequence-based clusters as nodes and connected them based on structural similarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Comments on revisions:

      I appreciate the authors responding to my comments. I think Fig. S10 helps put the structural data into more context. It would be helpful to make clearer in the legend what proteins are being compared, especially in 10C.

      Although I can see why the authors focus on the NifK extension and its potential connection to oxygen protection, I would point out that Vnf and Anf do not have this extension in their K subunit, and you find both Vnf and Anf in aerobic and facultative anaerobic diazotrophs. This is a minor point, but I think it is important to mention in the discussion.

      We thank the reviewer for their thoughtful comments. We now added an additional line to the Discussion following their recommendation and moved Figure S10 to main text.

      Reviewer #2 (Public review):

      Summary: 

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in environment, including oxygen levels and changes in metal availability. 

      The study predicts > 3000 structures of nitrogenases, corresponding to extant, ancestral and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive and admirable undertaking. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes. 

      We thank the reviewer for their summary and positive appraisal.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer’s agreement that our data, "support most of the conclusions made”.

      With respect to Concerns raised by reviewer 1:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      We do not fully agree that this comment is accurate - the implication is that we only show interaction between two exogenously expressed proteins, i.e. both exogenous  PHD1 and RepoMan, when in fact we show that tagged PHD1 interacts with endogenous RepoMan. The major technical challenge here is the well known difficulty of detetcing endogenous PHD1 in such cell lines. We agree that co-IP studies do not prove that this interaction is direct and never claim to have shown this, though we do feel that a direct interaction is most likely, albeit not proven.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      We agree that our current study is primarily a biochemical and cell biological study, rather than a genetic study. Nonetheless, similar biochemical and cellular approaches have been widely used and validated in previous studies in mechanisms regulating cell cycle progression and we are confident in the conclusions drawn based on the data obtained so far.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      We agree that it will be very interesting to analyse in more detail the cell cycle dynamics of RepoMan hydroxylation and H3T3 phosphorylation - along with other cell cycle parameters. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      Here again we agree that it will be very interesting to analyse in future the detailed binding interactions between wt and mutant RepoMan and other interacting proteins, including PP2A. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

      We agree with the main point underlining this comment, i.e., that there are still many things to be learned concerning the specific roles and mechanisms of the different PHD enzymes in vivo. We look forward to addressing these questions in future studies.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s comments that our manuscript uses biochemical and imaging tools to delineate a key mechanism in the regulation of the progression of the cell cycle and their appreciation that our experiments performed are, 'conclusive with well-designed controls.'

      With respect to the specific Concern raised by reviewer 2:

      Lack of in vitro reconstitution and binding data.

      We agree that it will be very interesting to pursue in vitro reconstitution studies and detailed binding data. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      Reviewer #3 (Public review):

      We appreciate the reviewer’s comments that our study, “is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex” and their conclusion that, “we should have no question about the validity of the PHD1-mediated hydroxylation”.

      With respect to the specific Concern raised by reviewer 3:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

      We do not agree that we rely solely on analysis of the single site pro-ala mutatin in RepoMan for our conclusions, since we also present a raft of additional experimental evidence, including knock-down data and experiments using both fumarate and FG. We would also reference the data we present on RepoMan in the parallel study by Jiang et al, which has also been reviewed by eLife and is currently available on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). Of course we agree with the reviewer that even although the muatnt RepoMan features only a single amino acid change, this could still result in undetermined structural effects on the RepoMan protein that could conceivably contribute, at least in part, to some of the phenotypic effects observed. Hopefully future studies will help to clarify this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for the extensive response to my comments and questions.

      Reviewer #2 (Recommendations for the authors):

      (1) The Fmr1/Fxr2 double KO mice are not well described in the Introduction.

      We have changed the sentence in the introduction to clarify that in Zhang et al ., 2008 they used a mouse lacking both the Fmr1 gene and its paralog Fxr2.

      (3) The Authors decided not to discuss the potential translation of the present study to human patients, despite their final conclusion statement.

      The paragraph below has been added to the end of the discussion:

      “Translational Implications”

      The present findings support the view that circadian disruption is not merely a downstream consequence of disease processes but actively contributes to symptom expression. Hence, the possibility that interventions designed to reinforce circadian rhythms can hold therapeutic value for individuals with FXS and related neurodevelopmental conditions. Given that sleep and circadian dysfunction are detectable early in development and are predictive of more severe clinical phenotypes, circadian-based interventions may be particularly beneficial if applied during periods of heightened neural plasticity. Importantly, time-restricted feeding represents a relatively low-cost, non-invasive strategy that could be feasibly implemented in realworld settings. Further translational work is needed to evaluate whether the mechanistic links identified here—between circadian misalignment, immune dysregulation, and behavioral impairments—are conserved in humans, and similar approaches can be implemented for clinical use.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Li and colleagues describes the impact of deficiency on the DKGα and ζ on Treg cells and follicular responses. The experimental approach is based on the characterization of double KO mice that show the emergence of autoimmune manifestations that include the production of autoantibodies. Additionally, there is an increase in Tfh cells, but also Tfr cells in these mice deficient in both DKGα and ζ. Although the observations are interesting, the interpretation of the observations is difficult in the absence of data related to single mutations. While a supplementary figure shows that the autoimmune manifestations are more severe in the DKGα and ζ deficient mice, prior observations show that a single DKGα deficiency has an impact on Treg homeostasis. As such, the contribution of the two chains to the overall phenotype is hard to establish.

      Strengths:

      Well-conducted experiments with informative mouse models with defined genetic defects.

      Weaknesses:

      The major weakness is the lack of clarity concerning what can be attributed to simultaneous DKGα and ζ deficiency versus deficiency on DKGα or ζ alone.

      Some interpretations are also not conclusively supported by data.

      We appreciate the reviewer 1’s positive comments about our manuscript and for the suggestion to include DGKα‑ or DGKζ‑single‑knockout (SKO) Tregs for the mechanistical studies. Unfortunately, performing this sound simple but truly extensive experiment would exceed our current budget and personnel capacity. Importantly, it is well known that DGKα and DGKζ act redundantly or synergistically in T cells, with single loss producing minimal or partial phenotypes compared with the double knockout. The comprehensive mechanistic data already presented for DGKαζ‑DKO Tregs therefore capture the combined functional and mechanistical deficit that is most relevant to DGK functions in Treg biology, and they support the conclusions drawn in this manuscript. The reviewer also pointed out some interpretation issues such as CD25 down regulation in Tfr cells and some minor issues. We appreciate the reviewer’s expertise and have revised the text and discussion accordingly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Li et al investigate the combined role of diacylglycerol (DAG) kinases (DGK) α and ζ in Foxp3+ Treg cells function that prevent autoimmunity. The authors generated DGK α and ζ Treg-specific double knockout mice (DKO) by crossing Dgkalpha-/- mice to DgKzf and Foxp3YFPCre/+ mice. The resulting "DKO" mice thus lack DGK α in all cells and DGK ζ in Foxp3+Treg cells. The authors show that the DKO mice spontaneously develop autoimmunity, characterized by multiorgan inflammatory infiltration and elevated anti-double-strand DNA (dsDNA), -single-strand DNA (ssDNA), and -nuclear autoantibodies. The authors attribute the DKO mice phenotype to Foxp3+Treg dysfunction, including accelerated conversion into "exTreg" cells with pathogenic activity. Interestingly, the combined deficiency of DGK α and ζ seems to release Treg cell dependence on CD28-mediated costimulatory signals, which the authors show by crossing their DKO mice to CD28-/- mice (TKO mice), which also develop autoimmunity.

      Strengths:

      The phenotypes of the mutant mice described in the manuscript are striking, and the authors provide a comprehensive analysis of the functional processes altered by the lack of DGKs.

      Weaknesses:

      One aspect that could be better explored is the direct role of "ex-Tregs" in causing pathogenesis in the models utilized.

      However, overall, this is an important report that makes a significant addition to the understanding of DAG kinases in Treg cell biology.

      We greatly appreciate reviewer 2’s positive comments about the manuscript. The data we presented in the manuscript show that DGKαζDKO Tregs but not WT Tregs are able to trigger autoimmunity in T cell deficient mice in the presence of WT CD4 T cells support that DGKαζDKO Tregs are pathogenic. Reviewer 2 suggested to test the direct role of DGKαζDKO Treg/ex-Tregs in the pathogenesis of autoimmune diseases in the absence of conventional T cells. This is really an interesting idea that we will test it in the future should recourse for executing the experiment become available.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We have added text related to this issue, particularly in the discussion (page 19, line 6), and have also added citations of studies in humans and non-human primates showing a causal relationship between preparatory activity in prefrontal and visual cortex and visual search performance (page 6, line 16).

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.<br /> This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We conducted the suggested analysis, and we did not find clear evidence of differences in decoding scene information between the earlier and later portions of the delay period. This may be due to insufficient power when the data are divided, individual differences in when preparatory activation is the strongest, or truly no difference in activation over the delay period. More details of this analysis can be found in the supplementary materials (page 12, line 16; Figure S1).

      Type in the abstract "curing" vs "during."

      Fixed.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We added reasoning for the other a priori ROIs in the introduction (page 4, line 26). There is substantial evidence suggesting that frontoparietal areas are involved in cognitive control, attentional control, and working memory. The ROIs we selected from frontal and parietal cortex are based on parcels within resting state networks defined by the s17-network atlases (Schaefer et al., 2018). The IFJ was defined by the HCP-MMP1 (Glasser et al., 2016). These regions are commonly used in studies of attention and cognitive control, and the exact ROIs selected are described in the section on “Regions of interest (ROI) definition”. While we have the strongest hypothesis for IFJ based on relatively recent work from the Desimone lab, the other ROIs in lateral frontal cortex and parietal cortex, are also well documented in similar studies, although the exact computation being done by these regions during tasks can be hard to differentiate with fMRI.\

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      One bit of trivia. In the abstract, you should define IFJ on its first appearance in the text. You get to that a bit later.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I really don't have much to suggest, as I thought that this was a clearly written report that offered a clever paradigm and data that supported the conclusions. My only suggestion would be to split the delay period activity and test whether the strength of the template evolves over time. Even though fMRI is not the best tool for this, still you would predict stronger decoding in the second half of the delay period

      Please see above for our response to the same comment.

      Reviewer #3 (Recommendations for the authors):

      I would just like to point out some minor aspects that might be worth improving before publishing this work.

      Abstract: While in general, the writing is clear and concise, I felt that the abstract of the manuscript was particularly hard to follow, probably because the authors at some point re-arranged individual sentences. For example, they write in line 12 about 'the preparatory period', but explain only in the following sentence that the preparatory period ensues 'before search begins'. This made it a bit hard to follow the overall logic and I think could easily be fixed. 

      We have addressed this comment and updated the abstract.

      Also in the abstract: 'The CONTENTS of the template typically CONTAIN...' sounds weird, no? Also, 'information is used to modulate sensory processing in preparation for guiding attention during search' sounds like a very over-complicated description of attentional facilitation. I'm not convinced either whether the sequence is correct here. Is the information really used to (first) modulate sensory processing (which is a sort of definition of attention in itself) to (then) prepare the guidance of attention in visual search?

      We have addressed this comment and updated the abstract.

      The sentence in line 7, 'However, many behavioral studies have shown that target-associated information is used to guide attention,...' (and the following sentence) assumes that the reader is somewhat familiar with the term 'target-associations'. I'm afraid that, for a naive reader, this term may only become fully understandable once the idea is introduced a bit later when mentioning that participants of the study were trained on face-scene pairings. I think it could help to give some very short explanation of 'target-associations' already when it is first mentioned. The term 'statistically co-occurring object pairs', for example, could be of great help here.

      Thank you for the suggestion. We have added it to the abstract.

      page 2, line 22: 'prefrotnal'

      Fixed.

      page 2, line 24/25: 'information ... can SUPPLANT (?) ... information'. (That's also a somewhat unfortunate repetition of 'information')

      Fixed.

      page 4, line 23-25: 'Working memory representations in lateral prefrontal and parietal regions are engaged in cognitive control computations that ARE (?) task non-specific but essential to their functioning'

      Fixed.

      page 7, line 1: maybe a comma before 'suggesting'?

      Fixed.

      page 7, line 14-16: Something seems wrong with this sentence: 'The distractor face was a race-gender match, which we previously FOUND MADE (?) target discrimination difficult enough to make the scene useful for guiding attention'

      We have addressed this comment and rewritten this part (now on page 7, line 18).

      Results / Discussion sections:

      In several figures, like in Fig3A, the three different IFJ regions, are grouped separately from the other frontal areas, which makes sense given the special role IFJ plays for representing task-related templates. However, IFJ is still part of PFC. I think it would be more correct to group the other frontal areas (like FEF vLPFC etc.) as 'Other Frontal' or even 'Other PFC'.

      We have made the changes based on the reviewer’s suggestion.

      In some of the Figures, e.g. Fig 3 and 5, I had the impression that the activation patterns of some conditions in vLPFC were rather close to the location of IFJ, which is just a bit posterior. I think I remember that functional localisers of IFJ can actually vary quite a bit in localisation (see e.g. in the Baldauf/Desimone paper). Also, I think it has been shown in the context of other regions, like the human FEF that its position when defined by localisation tasks is not always nicely and fully congruent with the respective labels in an atlas like the Glasser atlas. It might help to take this in consideration when discussing the results, particularly since the term vLPFC is a rather vague collection of several brain parcels and not a parcel name in the Glasser atlas. Some people might even argue that vLPFC in the broad sense contains IFJ, similar to how 'Frontal' contains IFJ (see above). How strong of a point do the authors want to make about activation in IFJ versus in vlPFC?

      We have now added text discussing the inability to truly differentiate between subregions of IFJ and other parts of vLPFC in the methods section on ROIs (page 25, line 13) and in the discussion (page 18, line 25). However, one might think that it is even more surprising given the likely imprecision of ROI boundaries that we see distinct patterns between the subregions of IFG defined by Glasser HCP-MMP1 and the other vLPFC regions defined by the 17-network atlases. We do not wish to overstate the precision of IFJ regions, but note the ROI results within the context of the larger literature. We are sure that our findings will have to be reinterpreted when newer methods allow for better localization of functional subregions of the vLPFC in individuals.

      Given that the authors nicely explain in the introduction how important templates are in visual search, and given that FEF has such an important role in serially guiding saccades through visual search templates, I think it would be worth discussing the finding that FEF did not hold representation of these targets. Of course, this could be in part due to the specific task at hand, but it may still be interesting to note in the Discussion section that here FEF, although important for some top-down attention signals, did not keep representations of the 'search' templates. Is it because there is no spatial component to the task at hand (like proposed in Bedini 2021)?

      We have now added text directly addressing this point and citing the Bedini et al. paper in the discussion (page 18, line 18). Besides our current findings, the relationship between IFJ and FEF is really interesting and will hopefully be investigated more in the future.

      Page 18, line 5: 'we the(N) associated...'

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then, in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of the endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following a high-fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells), plays a key role in the interaction with XCR1-expressing cDC1 and particularly in the atherosclerotic disease progression.<br /> Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Thank you

      Weaknesses:

      My criticism is limited to the analysis of the scRNAseq data of the cDC1. I think it would be important to match these data with published data sets on cDC1. In particular, the data set by Sophie Janssen's group on splenic cDC1 might be helpful here (PMID: 37172103; https://www.single-cell.be/spleen_cDC_homeostatic_maturation/datasets/cdc1). It would be good to assign a cluster based on the categories used there (early/late, immature/mature, at least for splenic DC).

      Thank you very much for your help. Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from ApoE<sup>–/–</sup> mice, we re-annotated the populations, following the methodology proposed by Sophie Janssen's group. These results are presented in Figure S9 and Figure S10 and described in detail in the Results and Discussion section.

      Please refer to the Results section from line 264 to 284: “Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from hyperlipidemic mice, we annotated the 10 populations as shown in Figure S9A, following the methodology from a previous study [41]. Ccr7<sup>+</sup> mature cDC1s (Cluster 3, 7 and 9) and Ccr7- immature cDC1s (remaining clusters) were identified across cDC1 cells sorted from aorta, spleen and lymph nodes (Figure S9B). Further stratification based on marker genes reveals that Cluster 10 is the pre-cDC1, with high expression level of CD62L (Sell) and low expression level of CD8a (Figure S9C). Cluster 6 and 8 are the proliferating cDC1s, which express high level of cell cycling genes Stmn1 and Top2a (Figure S9D). Cluster 1 and 4 are early immature cDC1s, and cluster 2 and 5 are late immature cDC1s, according to the expression pattern of Itgae, Nr4a2 (Figure S9E). Cluster 9 cells are early mature cDC1s, with elevated expression of Cxcl9 and Cxcl10 (Figure S9F). Cluster 3 and 7 as late mature cDC1s, characterized by the expression of Cd63 and Fscn1 (Figure S9G). As shown in Figure 5C and Figure S9, the 10 populations displayed a major difference of aortic cDC1 cells that lack in pre-cDC1s (cluster 10) and mature cells (cluster 3, 7 and 9). Interestingly, in hyperlipidemic mice splenic cDC1 possess only Cluster 3 as the late mature cells while the lymph node cDC1 cells have two late mature populations namely Cluster 3 and Cluster 7. In further analysis, we also compared splenic cDC1 cells from HFD mice to those from ND mice. As shown in Figure S10, HFD appears to impact early immature cDC1-1 cells (Cluster 1) and increases the abundance of late immature cDC1 cells (Cluster 2 and 5), regardless of the fact that all 10 populations are present in two origins of samples. We also found that Tnfaip3 and Serinc3 are among the most upregulated genes, while Apol7c and Tifab are downregulated in splenic cDC1 cells sorted from HFD mice”.  

      Please refer to the Discussion section from line 380 to 385: “Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and the data are clearly presented. However, several points require clarification:

      (1) In Figure 1C (upper plot), it is not clear what the Xcr1 single-positive region in the aortic root represents, or whether this is caused by unspecific staining. So I wonder whether Xcr1 single-positive staining can reliably represent cDC1. For accurate cDC1 gating in Figure 1E, Xcr1+CD11c+ co-staining should be used instead.

      The observed false-positive signal in the wavy structures within immunofluorescence Figure 1C (upper panel) results from the strong autofluorescence of elastic fibers, a major vascular wall component (alongside collagen). This intrinsic property of elastic fibers is a well-documented confounder in immunofluorescence studies [A, B].

      In contrast, immunohistochemistry (IHC) employs an enzymatic chromogenic reaction (HRP with DAB substrate) that generates a brown precipitate exclusively at antigen-antibody binding sites. Importantly, vascular elastic fibers lack endogenous enzymatic activity capable of catalyzing the DAB reaction, thereby preventing this source of false positivity in IHC.

      Given that Xcr1 is exclusively expressed on conventional type 1 dendritic cells [C], and considering that IHC lacks the multiplexing capability inherent to immunofluorescence for antigen co-localization, single-positive Xcr1 staining reliably identifies cDC1s in IHC results.

      [A] König, K et al. “Multiphoton autofluorescence imaging of intratissue elastic fibers.” Biomaterials vol. 26,5 (2005): 495-500. doi:10.1016/j.biomaterials.2004.02.059

      [B] Andreasson, Anne-Christine et al. “Confocal scanning laser microscopy measurements of atherosclerotic lesions in mice aorta. A fast evaluation method for volume determinations.” Atherosclerosis vol. 179,1 (2005): 35-42. doi:10.1016/j.atherosclerosis.2004.10.040

      [C] Dorner, Brigitte G et al. “Selective expression of the chemokine receptor XCR1 on cross-presenting dendritic cells determines cooperation with CD8+ T cells.” Immunity vol. 31,5 (2009): 823-33. doi:10.1016/j.immuni.2009.08.027

      (2) Figure 4D suggests that cDC1 depletion does not affect CD4+/CD8+ T cells. However, only the proportion of these subsets within total T cells is shown. To fully interpret effects, the authors should provide:

      (a) Absolute numbers of total T cells in aortas.

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We agree that assessing both proportions and absolute numbers in Figure 4 provides a more complete picture of the effects of cDC1 depletion on T cell populations. Furthermore, we also add the absolute count of cDC1 cells and total T cells, and CD44 MFI (mean fluorescence intensity) in CD4<sup>+</sup> and CD8<sup>+</sup> T cells in Figure 4, and supplemented corresponding textual descriptions in the revised manuscript.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) How does T cell activation mechanistically influence atherosclerosis progression? Why was CD69 selected as the sole activation marker? Were other markers (e.g., KLRG1, ICOS, CD44) examined to confirm activation status?

      We sincerely appreciate these insightful comments. As extensively documented in the literature, activated effector T cells (both CD4+ and CD8+) critically promote plaque inflammation and instability through their production of pro-inflammatory cytokines (particularly IFN-γ and TNF-α), which drive endothelial activation, exacerbate macrophage inflammatory responses, and impair smooth muscle cell function [A].

      In our study, we specifically investigated the role of cDC1 cells in atherosclerosis progression. Our key findings demonstrate that cDC1 depletion attenuates T cell activation (as shown by reduced CD69/CD44 expression) and that this reduction in activation is functionally linked to the observed decrease in atherosclerosis burden in our model. 

      Regarding CD44 as an activation marker, we performed quantitative analyses of CD44 mean fluorescence intensity (MFI) in aortic T cells (Figure 4). Importantly, the MFI of CD44 was significantly lower on both CD4+ and CD8+ T cells from Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 4. We added the related description in the Result section.

      Please refer to the Results section from line 185 to 187 “CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4+ and CD8+ T cells from Xcr1+ cDC1 depleted mice compared to controls (Figure 4G and H)”.

      Similarly, MFI of CD44 was significantly lower on both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 7. We also added the related description in the Result section.

      Please refer to the Results section from line 308 to 309 “Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F).”

      [A] Hansson, Göran K, and Andreas Hermansson. “The immune system in atherosclerosis.” Nature immunology vol. 12,3 (2011): 204-12. doi:10.1038/ni.2001

      (4) Figure 7B: Beyond cDC1/2 proportions within cDCs, please report absolute counts of: Total cDCs, cDC1, and cDC2 subsets. Figure 7D: In addition to CD4+/CD8+ T cell proportions, the following should be included:

      (a) Total T cell numbers in aortas

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We have now included in Figure 7 the absolute counts of cDC, cDC1, and cDC2 cells, along with CD4<sup>+</sup> and CD8<sup>+</sup> T cells in aortic tissues. Additionally, we provide the corresponding CD44 mean fluorescence intensity (MFI) measurements for both CD4<sup>+</sup> and CD8<sup>+</sup> T cell populations. We added the related description in the Result section.

      Please refer to the Results section from line 303 to 311: “The flow cytometric results illustrated that both frequencies and absolute counts of Xcr1<sup>+</sup> cDC1 cells in the aorta were significantly reduced, but cDCs and cDC2 cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure 7A-C). Moreover, in both lymph node and spleen, the absolute numbers of pDC, cDC1 and cDC2 from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure S11). Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F). However, aortic CD8<sup>+</sup> T cells exhibited reduced frequency and absolute count, while CD4<sup>+</sup> T cells showed increased frequency but unchanged counts in Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mouse versus controls (Figure 7G and H).”

      (5) cDC1 depletion reduced CD69+CD4+ and CD69+CD8+ T cells, whereas Xcl1 depletion decreased Xcr1+ cDC1 cells without altering activated T cells. How do the authors explain these different results? This discrepancy needs explanation.

      We sincerely appreciate your professional and insightful comments regarding the mechanistic relationship between cDC1 depletion and T cell activation. Direct cDC1 depletion in the Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> micmodel removes both recruited and tissue-resident cDC1s, eliminating their multifunctional roles in antigen presentation, co-stimulation and cytokine secretion essential for T cell activation. In contrast, Xcl1 depletion reduces, but does not eliminate cDC1 migration into plaques. Furthermore, alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue cDC1 recruitment [13, 68, 69], and non-cDC1 APCs (e.g., monocytes, cDC2s) may compensate for T cell activation [55, 70]. We emphasize that Xcl1 depletion specifically failed to alter T cell activation in hyperlipidemic ApoE<sup>–/–</sup> mice. However, its impact may differ in other pathophysiological contexts due to compensatory mechanisms. We thank you again for highlighting this nuance, which strengthens our mechanistic interpretation. We have added these points to the discussion section and included new references.

      Please refer to the Discussion section from line 407 to 413: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases.”. [13] Eisenbarth, S C. “Dendritic cell subsets in T cell programming: location dictates function.” Nature reviews. Immunology vol. 19,2 (2019): 89-103. doi:10.1038/s41577-018-0088-1 [55] Brewitz, Anna et al. “CD8+ T Cells Orchestrate pDC-XCR1+ Dendritic Cell Spatial and Functional Cooperativity to Optimize Priming.” Immunity vol. 46,2 (2017): 205-219. doi:10.1016/j.immuni.2017.01.003 [68] de Oliveira, Carine Ervolino et al. “CCR5-Dependent Homing of T Regulatory Cells to the Tumor Microenvironment Contributes to Skin Squamous Cell Carcinoma Development.” Molecular cancer therapeutics vol. 16,12 (2017): 2871-2880. doi:10.1158/1535-7163.MCT-17-0341.[69] He F, Wu Z, Liu C, Zhu Y, Zhou Y, Tian E, et al. Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration. Signal Transduct Target Ther. 2024;9(1):139. Epub 2024/05/30. doi: 10.1038/s41392-024-01838-9. PubMed PMID: 38811552; PubMed Central PMCID: PMCPMC11137111.[70] Böttcher, Jan P et al. “Functional classification of memory CD8(+) T cells by CX3CR1 expression.” Nature communications vol. 6 8306. 25 Sep. 2015, doi:10.1038/ncomms9306.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 32 - The authors might want to add that the mouse model leads to a "constitutive" depletion of cDC1.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 31 to 33: “we established Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice, a novel and complex genetic model, in which cDC1 was constitutively depleted in vivo during atherosclerosis development”.

      (2) Line 187-188: The authors claim that T cell activation was "inhibited" if cDC1 was depleted. The data shows that the T cells were less activated, but there is no indication of any kind of inhibition; this should be corrected.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) Why are some splenic DC clusters absent in LNs and vice versa? This is not obvious to this reviewer and should at least be discussed.

      We appreciate the insightful question regarding the absence of certain splenic DC clusters in LNs. This phenomenon in Figure 5 aligns with the 'division of labor' paradigm in dendritic cell biology: tissue microenvironments evolve specialized DC subsets to address local immunological challenges. The absence of universal clusters reflects functional adaptation, not technical artifacts. We acknowledge that this tissue-specific heterogeneity warrants further discussion and have expanded our analysis to address this point in the discussion part of our manuscript.

      Please refer to the Discussion section from line 375 to 385: “This pronounced tissue-specific compartmentalization of Xcr1<sup>+</sup> cDC1 subsets may related to multiple mechanisms including developmental imprinting that instructs precursor differentiation into transcriptionally distinct subpopulations [62], and microenvironmental filtering through organ-specific chemokine axes (e.g., CCL2/CCR2 in spleen) selectively recruits receptor-matched subsets [63, 64]. This spatial specialization optimizes pathogen surveillance for local immunological challenges. Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      [62]. Liu Z, Gu Y, Chakarov S, Bleriot C, Kwok I, Chen X, et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell. 2019;178(6):1509-25 e19. Epub 2019/09/07. doi: 10.1016/j.cell.2019.08.009. PubMed PMID: 31491389.

      [63]. Bosmans LA, van Tiel CM, Aarts S, Willemsen L, Baardman J, van Os BW, et al. Myeloid CD40 deficiency reduces atherosclerosis by impairing macrophages' transition into a pro-inflammatory state. Cardiovasc Res. 2023;119(5):1146-60. Epub 2022/05/20. doi: 10.1093/cvr/cvac084. PubMed PMID: 35587037; PubMed Central PMCID: PMCPMC10202633.

      [64]. Mildner A, Schonheit J, Giladi A, David E, Lara-Astiaso D, Lorenzo-Vivas E, et al. Genomic Characterization of Murine Monocytes Reveals C/EBPbeta Transcription Factor Dependence of Ly6C(-) Cells. Immunity. 2017;46(5):849-62 e7. Epub 2017/05/18. doi: 10.1016/j.immuni.2017.04.018. PubMed PMID: 28514690.

      [41]. Bosteels V, Marechal S, De Nolf C, Rennen S, Maelfait J, Tavernier SJ, et al. LXR signaling controls homeostatic dendritic cell maturation. Sci Immunol. 2023;8(83):eadd3955. Epub 2023/05/12. doi: 10.1126/sciimmunol.add3955. PubMed PMID: 37172103.

      (4) The authors should discuss how XCL1 could impact lesional cDC1 and T cell abundance. Notably, preDCs do not express XCR1, and T cells express XCL1 following TCR activation. Is there a recruitment or local proliferation defect of cDC1 in the absence of XCL1? Could there also be a role for NK cells as a potential source of XCL1?

      We appreciate your insightful questions regarding the differential effects of Xcl1 on cDC1s and T cells. Xcl1 primarily mediates the recruitment of mature cDC1s. Our data demonstrate that Xcl1 deletion significantly reduces aortic cDC1 abundance, which correlates with a concomitant decrease in CD8<sup>+</sup> T cell numbers within the aorta. These findings strongly suggest that the Xcl1-Xcr1 axis plays a regulatory role in T cell accumulation in aortic plaques.

      Consistent with prior studies [A, B], cDC1 recruitment can occur in the absence of Xcl1 which echoes our findings that cDC1 cells were still found in Xcl1 knockout aortic plaque but in lower abundance. It is very true that further studies are required to address how the Xcl1 dependent and independent cDC1 cells activate T cells and if they possess capability of proliferation in tissue differentially. We have added these points in discussion section.

      Please refer to the Discussion section from line 407 to 415: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases. In summary, our findings identify Xcl1 as a potential therapeutic target for atherosclerosis therapy, though its cellular origins and regulation of lesional Xcr1<sup>+</sup> cDC1 and T cells dynamics require further studies”.

      In literatures, Xcl1 are expressed in NK cells and subsects of T cells, and NK cells can be a potential source of Xcl1 during atherosclerosis which deserve further investigations [A, C, D].

      [A] Böttcher, Jan P et al. “NK Cells Stimulate Recruitment of cDC1 into the Tumor Microenvironment Promoting Cancer Immune Control.” Cell vol. 172,5 (2018): 1022-1037.e14. doi:10.1016/j.cell.2018.01.004

      [B] He, Fenglian et al. “Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration.” Signal transduction and targeted therapy vol. 9,1 139. 29 May. 2024, doi:10.1038/s41392-024-01838-9

      [C] Woo, Yeon Duk et al. “The invariant natural killer T cell-mediated chemokine X-C motif chemokine ligand 1-X-C motif chemokine receptor 1 axis promotes allergic airway hyperresponsiveness by recruiting CD103+ dendritic cells.” The Journal of allergy and clinical immunology vol. 142,6 (2018): 1781-1792.e12. doi:10.1016/j.jaci.2017.12.1005

      [D] Winkels, Holger et al. “Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry.” Circulation research vol. 122,12 (2018): 1675-1688. doi:10.1161/CIRCRESAHA.117.312513

      Reviewer #2 (Recommendations for the authors):

      There is a logical error in line 298. I suggest revising to: "Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1+ cDC1 cells, which subsequently drive T cell activation in lesions."

      Thanks for your advice. Since Xcl1 deficiency reduced both the frequencies and absolute counts of Xcr1+ cDC1 and CD8+ T cells in lesions without affecting T cell activation, we revised the sentence as you suggested.

      Please refer to the Results section from line 314 to 315: “Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1<sup>+</sup> cDC1 cells, and facilitating CD8<sup>+</sup> T cell accumulation in lesions”.